Genome-wide analysis reveals gene expression and metabolic network dynamics during embryo development in Arabidopsis

Embryogenesis is central to the life cycle of most plant species. Despite its importance, because of the difficulty associated with embryo isolation, global gene expression programs involved in plant embryogenesis - especially the early events following fertilization are largely unknown. To address this gap, we have developed methods to isolate whole live Arabidopsis thaliana embryos as young as zygote and performed genome-wide profiling of gene expression . These studies revealed insights into patterns of gene expression relating to: maternal and paternal contributions to zygote development; chromosomal level clustering of temporal expression in embryogenesis; and embryo-specific functions. Functional analysis of some of the modulated transcription factor encoding genes from our datasets confirmed that they are critical for embryogenesis. Furthermore, we constructed stage-specific metabolic networks mapped with differentially regulated genes by combining the microarray data with the available KEGG metabolic datasets. Comparative analysis of these networks revealed the network-associated structural and topological features, pathway interactions and gene expression with reference to the metabolic activities during embryogenesis. Together, these studies have generated comprehensive gene expression datasets for embryo development in Arabidopsis and may serve as an important foundational resource for other seed plants.


Introduction
Embryogenesis represents an important phase in the life cycle of flowering plants; it begins with the zygote, the product of fertilization of female gamete (egg cell) with male gamete (sperm cell). The developmental events that culminate in the production of a mature embryo from a single cell zygote are precisely coordinated and relatively conserved in the majority of angiosperms (Goldberg et al., 1994). During this phase of development, the embryonic body plan is laid out, starting with apical-basal polarity, followed by partitioning of the apical domain into cotyledons and the shoot apical meristem (SAM), and differentiation of the basal domain to form the hypocotyl and root apical meristem (RAM). This overall theme is conserved among diverse plants although some species-specific differences also exist. This embryonic program, in coordination with the developing endosperm and seed coat, plays a central role in defining many of the key aspects of seed development and diversity, and this impacts many agronomic traits in crops (Huh et al., 2007;Braybrook and Harada, 2008).
Significant advances have been made in studying and understanding early embryogenesis in animal model systems because of the ease of isolating embryos (Carroll et al., 2005). In plants, fertilization and further embryo development occurs within the ovule and this makes it difficult to isolate embryos at very early stages (Wei and Sun, 2002). Thus the molecular events associated with early stages of plant embryogenesis are largely unknown. Without such information, it is not possible to advance the knowledge to a level comparable to that obtained by system level integrative approaches in yeast and animal models (Ihmels et al., 2004;Cui et al., 2007). Arabidopsis thaliana is the best studied model angiosperm with several genetic resources and databases (Alonso and Ecker, 2006). Exploiting these resources requires overcoming the challenges in accessing the diminutive embryos. In this study, we report successful isolation and generation of global gene expression datasets of Arabidopsis embryos at key stages of development, determination of the metabolic networks and their dynamics during embryo development, and experimental validation of inferred key nodes. Furthermore, cases of higher order regulation, chromosomal regiocentric coordination of gene expression, maternal and paternal gamete contribution to early stages of embryo development are highlighted. Together, the results provide a comprehensive view of gene expression patterns, new regulatory insights and metabolic network models for embryogenesis in Arabidopsis and these lay the foundation for further dissection of individual aspects of embryogenesis.

Isolation of developing embryos from Arabidopsis seeds
Isolation of single-cell zygotes has been a major challenge in plants. This is particularly the case in Arabidopsis because of the small size of the ovules at fertilization (160 x 140 µm; Fig. 1A) and the embryos within (i.e., elongated zygote size is 40 x 13 µm; Fig. 1B1). We developed a method for isolation of whole, live embryos from the fertilized ovules at various key stages including the zygote. Briefly, two incisions were made at the micropylar end of the ovule followed by careful micro-dissection of intervening cells exposing the central part of the micropylar tube where the transparent embryo is seated; the embryo is then released intact by gentle manipulation of the micropylar tube with fine forceps ( Fig. 1A and B). Following this procedure, embryos at the other early stages could also be obtained. Embryos at torpedo stage and thereafter were isolated by slicing the ovule and releasing the embryo with forceps. Structural integrity of these isolated embryos was randomly verified by critical examination with Nomarski microscopy (Fig. 1B). Due to the difficulty in differentiating adjacent embryo stages under the dissecting microscope, the collected samples represent pooling at early stages: Z, zygote (  Table S22) and generated a large dataset based on the global gene expression levels and differentially expressed genes at all stages of embryo development (Fig. 2, Supplemental Fig. S4 and Supplemental Table S1).
A searchable database of our data for digital gene expression in Arabidopsis embryo development is available at http://www2.bri.nrc.ca/plantembryo/ (Supplemental Fig. S1b). Cumulatively, ~78% of the genes were expressed during the course of embryogenesis. Prolific genome activity was evident even at the earliest zygote stage where 58% of genes were expressed (Supplemental Table S7). Just prior to the developmentally important transition from globular to heart stage (G H) in which the primordial body plan is established, the largest fraction (62%) of the genes were expressed. Near maturity, fewer genes (55%) were expressed despite the embryo containing more cells (Supplemental Table   S7). Studies using laser-capture micro-dissection (LCM) and microarray approaches have also estimated expression of comparable number of genes during embryogenesis in Arabidopsis and soybean (Casson et al., 2005;Le et al., 2007;Le et al., 2010). Cluster analyses of the differentially expressed genes between stages revealed unique patterns associated with biological activities operating during different developmental phases of Arabidopsis embryogenesis.
For example, the Z and Q stages clustered closer than with other stages that are temporally distal (Fig. 2).
Gene expression patterns correlate timing of developmental events and transcription factor (TF) genes are robustly modulated during early embryogenesis On examining the genes that function during cell cycle, primary metabolism, storage reserve synthesis, auxin and ABA biosynthesis and signaling in our datasets, we found discrete expression patterns that are summarized in Supplemental Fig. S2. Highlights of these results will be presented in subsequent sections. Within each stage, expression patterns that were less shared with adjacent stages and therefore deemed unique to the given stage were found.
Additionally, overlapping functional gene groups were identified within Z and Q  Table S5). Our study with isolated embryos revealed a clear separation between the early and late stages of embryogenesis ( Fig. 2), and this is consistent with a previous study that clustered globular and heart stages together while later stages showed divergent clustering (Spencer et al., 2007).
Although ribosomal proteins are largely considered as constitutively expressed and housekeeping in function, we observed dynamic regulation (Supplemental Zygote and quadrant stages represent the earliest events in embryogenesis. In these two stages, 3162 genes are modulated, of which 1630 were up-regulated and 1532 were down-regulated respectively in the Z stage relative to the Q stage. Further analysis of the genes expressed in the Z stage revealed that transcription factors (TFs) represented 7.5% of the modulated genes (Supplemental Table S8a). Notable among these is the zinc finger group. Only 4 -5% of the modulated genes are TFs in the later stages of embryo development.  Table S8b). These genes include MOM1, CHR34, RPD3A, VRN2 and ATX1 that are modulated during the early stages of embryogenesis, suggesting their potential key roles in post-fertilization.

Maternal and paternal gene expression programs in zygote and quadrant stage embryos
The gene expression programs of the parental gametes play important roles in fertilization and zygote development. The state of chromatin (Schaefer et al., 2007), activation of the zygote-specific gene expression (Stitzel and Seydoux, 2007), involvement of maternal transcripts in the initiation and maintenance of the zygote, and maternal to zygote transition (MZT) (Baroux et al., 2008) are all important. Our knowledge of these processes in plants is still at infancy. We used published gametophyte-enriched microarray datasets (Becker et al., 2003;Honys and Twell, 2003;Yu et al., 2005) to discern any patterns of gene expression common to the zygote and the male or female gametophytes. Unless parents with expressed sequence polymorphisms are used and the hybridization platform is allele-specific, parent-of-origin cannot be determined in such analyses. With this limitation in mind, our analysis showed that gamete and zygote expression shared a large number of genes at the two early stages (Z and Q) and such an overlap receded in subsequent embryo stages. We found that 712 of the genes expressed in a female gametophyte-enriched manner are also expressed in the early stages (Z and Q). Notably, the corresponding number for male gametophyte expressed genes was 51% (239) (Supplemental Table S2). If allele-specific expression was the underlying cause, the female genome would be contributing more than the male genome (Vielle-Calzada et al., 2000) but the contribution of the male genome would also be significant. Obviously, the data available above do not permit resolution of pre-existing and de novo transcripts of parental genome origin. To address the question of parent-of-origin gene expression pattern in the zygote, we performed qRT-PCR analysis for 14 gene targets (9 pollen enriched, 5 embryo-sac enriched). The results confirmed their predicted specific expression in paternal or maternal parent as well as in the zygote (Supplemental Table S8d). Furthermore, GUS-reporter developed with one of the pollen enriched genes (At3g28780) showed pollen-specific expression that continued in the pollen tube and after fertilization in the zygote ( Fig. 3A-C).
The GUS reporter construct for the At3g28780 gene was developed using a 1.98 kb putative promoter and introduced into Arabidopsis Col to generate transgenic lines. Pollen enrichment of this gene is consistent with the findings of a previous pollen proteome study (Holmes-Davis et al., 2005). Reciprocal crosses of the GUS line with wild type confirmed that the GUS expression in the zygote is paternally derived (Fig. 3B-E). The functional significance of paternal contribution is evident from a recent finding of paternally controlled embryo patterning as determined by SHORT SUSPENSOR (SSP) expression in Arabidopsis (Bayer et al., 2009). A GFP reporter assay with a maternally expressed (embryo sac enriched) gene (At4g07410) selected from this study showed expression in the female gametophyte but not in the pollen (Fig. 3F-H). The GFP reporter construct represents a translational fusion at the C-terminus of At4g07410 ORF (1.95 kb putative promoter of this gene is used). When the GFP reporter was introduced paternally, expression was observed in the zygote but not in the pollen, suggesting de novo transcription after fertilization ( Fig. 3I and 3J). Together, these observations suggest that some of the predominantly or specifically expressed genetic programs in the contributing gametes are retained after fertilization in the zygote indicating expression state of the corresponding genes after fertilization is influenced by contributing gametic programs. The gene expression patterns of zygote and quadrant stage embryos inferred from this study will likely include maternally and paternally inherited transcripts and/or de novo expression. These findings likely have implications in reprogramming of the parental genomes of egg and sperm cell to adopt the zygotic/embryonic program as suggested in animal studies (Tadros and Lipshitz, 2009). The results from this study also suggest that similar mechanisms likely govern early zygotic fate in plants and animals (Baroux et al., 2008). The observations from this study will provide important leads to investigate further biological implications of parental reprogramming after fertilization in plants.

Higher order gene expression patterns
To explore the dynamic gene expression patterns across embryo stages, we compared the modulated genes between adjacent embryonic stages. We found that most of the modulated genes (44%) are embryo stage transition-specific genes, implying that those differentially expressed genes are significant to that stage and then remain neutral for the rest of embryogenesis (Supplemental Table S9). To further analyze gene expression patterns across embryogenesis, as described previously (Yu et al., 2007), we classified the expressed genes into 7 groups according to the number of embryonic stages. We found that most of the genes were expressed across multiple embryonic stages, 73% of the genes were expressed in more than 5 stages (Supplemental Table S3 and S4). These results suggest that many genes and their associated biological processes are shared by different embryo stages. Similar observations were made in a recent independent study using seeds at different developmental stages (Le et al., 2010). To test if the observed gene expression patterns were associated with specific chromosomes, we further examined the distribution of these 7 groups on five Arabidopsis chromosomes. This analysis suggests that broadly expressed genes during embryogenesis were enriched on chromosomes 1 and 2, whereas, stage-specific genes were predominantly distributed on chromosomes 3, 4 and 5 ( Fig. S5a and S5c). Consistently, more modulated genes can be found on chromosomes 3, 4 and 5 compared to chromosomes 1 and 2 (Fig. S5b). To test if these expression patterns observed in Arabidopsis embryos is a general phenomenon across organisms, similar analysis was performed using microarray datasets available for Caenorabditis elegans embryo development (Hill et al., 2000). Interestingly, a similar chromosome-specific distribution of higher order gene expression was also observed in C.elegans (Fig. S5c). We next investigated the highly expressed gene clusters and their neighbors on chromosomes in Arabidopsis. Statistical analysis of the highly expressed gene clusters (see Supporting information-Methods) showed that chromosome 5 is enriched for such clusters (Supplemental Table S10 and S11). Furthermore, comparing these clusters across stages showed that zygote and quadrant stage embryos shared most of these clusters compared to the later embryo stages, suggesting that most of the clusters (6/15) are stage-specific (Supplemental Table S11). Together, these analyses suggest co-regulation at higher order level, viz., the chromosomal or chromosomal segment level. The significance of chromosomal context for co-regulated gene clusters observed in this study may have similar implications to gene expression as highlighted for animal systems in a recent review (Cremer and Cremer, 2010).

Validation and functional analysis
We validated the microarray data by qRT-PCR of randomly selected genes and the results showed good correlation with the microarray studies (Supplemental Table S8e). We conducted similar correlation analysis using virtual datasets of Genevestigator (Zimmermann et al., 2004) and the eGFP browser (Winter et al., 2007). Unlike our dataset, these datasets were generated from developing whole seeds (Hennig et al., 2003;Schmid et al., 2005;Winter et al., 2007). For the majority of the genes analyzed, especially those active in late embryogenesis, the expression patterns were comparable (Supplemental Fig.   S2a). In addition to the above analyses, we isolated putative promoters of some differentially expressed genes and tested their expression patterns with a GUS reporter. These studies showed GUS expression in the embryos consistent with the corresponding microarray datasets (Fig. 4). Together, these validation studies provided experimental evidence for the microarray datasets generated in this study.
We selected 22 genes encoding TFs that were modulated at different stages of embryo development for functional characterization (Fig. 5, Supplemental Table   S8c). Loss-of-function screens for T-DNA insertion lines for these genes did not produce detectable phenotypes, presumably due to functional redundancy. In another approach, we used the Drosophila Engrailed (En) repressor domain which was shown to produce dominant negative phenotypes in Arabidopsis (Markel et al., 2002): 14 of the 22 TF constructs showed a range of strong embryo phenotypes (Fig. 5, Supplemental Table S8c), and the rest showed weaker embryo, endosperm and in some cases post embryonic phenotypes (Supplemental Table S8c). Among these, five putative zinc finger-encoding genes that are modulated in Z and Q stages caused developmental arrest at early stages of embryogenesis. These include, developmental arrest at two, four, eight cell stage and additionally also displayed abnormal cell divisions further affecting basal lineage, suspensor and endosperm development (Fig. 5). These phenotypes suggest that the putative zinc finger transcription factors are likely redundantly involved in conferring important functions during early phases of embryogenesis in Arabidopsis. Interestingly, recent studies have also implicated zinc finger transcription factors in the early embryo development in Drosophila (Liang et al., 2008). We have also observed mutant embryo phenotypes with BELL1 LIKE, WOX6, TCP16 and SCL7 TFs ( Fig. 5 and Supplemental Table   S8c). These dominant phenotypes observed for some of the TFs suggest that their expression and functions may be critical for embryo development.  Table   S6; and Supporting information -Methods). We focused on the genes associated with biochemical pathways in fatty acid, carbohydrate, amino acid, nucleotide, vitamin metabolism as well as TCA cycle. The six stage-transition networks were constructed by mapping the modulated genes of adjacent stages onto the network. As shown in Supplemental Fig. S6, the resulting networks suggest that gene regulation between adjacent stages is highly dynamic during embryogenesis (Supplemental Table S13). We studied the gene regulatory patterns in the networks by examining the network structure characteristics, i.e., upstream nodes, downstream nodes, hubs, and cutpoints (see Supporting information-Methods). We identified 71 "network hubs" that represent intersections of many pathways which are preferentially regulated in most stage transitions (Supplemental Table S12b and S14) whereas many cutpoints and their downstream nodes are least preferentially regulated (Supplemental Table   S15). The gene co-expression network modules, clusters of nodes that are either up-or down-regulated in the networks, were examined using modulated genes in The top 10% of the highly expressed genes across all embryo stages involved in metabolism was mapped onto the network to identify the network core (Supporting information -Methods). The network core contains 28 connected nodes (reactions), most of which are involved in glycolysis, pentose phosphate pathway, pyruvate metabolism and carbon fixation (Supplemental Table S16) indicating the essential role of these nodes in core metabolic activities that operate during embryogenesis (Supplemental Table S17). A significantly larger number of links was found between nodes in the network core (Supplemental Table S18) which is consistent with the overrepresentation of hubs in the core (i.e., 46 % of the core nodes are hubs; hubs are highly connected nodes representing reactions that share metabolites -see Supplemental Table S12b).

Dynamic regulation of metabolic networks
These characteristics of the network core indicate that metabolites can be easily converted in both directions in the core. Interestingly, the stage transitions between Q and T stages showed higher number of up-regulated nodes than the others (Supplemental Table S19), highlighting the coordinated elaboration of metabolic pathways during embryo development.

Gene-regulation relationships between the network core and its periphery
Examination of the fractions of differentially regulated (modulated) nodes in nthorder neighbors from the network core in the transition networks (Supporting information-Methods) revealed a positive correlation between the increase of modulated nodes in the core and their neighbors (Supplemental Table S20), indicating that they are coordinately regulating metabolic activities. Closer look at the up-and down-regulated pathways in these transition networks revealed that fatty acid biosynthesis and other metabolism-related genes were significantly upregulated in the transition networks, viz., H T and T B (Supplemental Table   S21a and b). Furthermore, we observed that carbohydrate metabolism and biosynthesis genes were up-regulated relatively earlier than fatty acid biosynthesis, whereas the protein synthesis genes were up-regulated relatively later than fatty acid biosynthesis (Supplemental Table S21a and b). However, fatty acid, carbohydrate metabolism and biosynthesis genes were all significantly up-regulated from the G and H to B embryo stages consistent with previous reports (Girke et al., 2000;Schmid et al., 2005).

Embryo-defective mutations maps to end nodes
To gain biological insights into the metabolic network models, we selected 52 of the known Arabidopsis EMBs (embryo-defective mutants) [358 EMB listed at www.seedgenes.org (Tzafrir et al., 2003)] associated with metabolic pathways that cause embryo lethality (Tzafrir et al., 2003;Tzafrir et al., 2004) and mapped them onto the metabolic networks (Supplemental Table S12a). By calculating the enrichment of different node types (Supplemental Table S12b), we found that the end nodes (i.e., the last reaction in a pathway) are enriched for EMBs (Supplemental Fig. S6). These EMBs are associated with biosynthetic and metabolic pathways of fatty acids, carbohydrates, nucleotides, and amino acids. Therefore, lesions in the genes associated with the end nodes are not compensable by alternative feedback from neighboring pathways and can result in embryo lethality and perturb embryo development [ Supplemental Fig. S8, www.seedgenes.org (Kajiwara et al., 2004)]. However, interestingly EMBs associated with hub nodes display patterning defects (e.g. acc1) suggesting that these metabolic lesions also likely impact on embryo developmental programs.

Conclusions:
In this study, we have successfully isolated live zygote to late stage embryos of Arabidopsis and studied global gene expression patterns that regulate embryogenesis. This represents the first genome-wide gene expression profiling of embryogenesis in Arabidopsis and in plants, and the data presented here will serve as foundational resource for future studies addressing fundamental molecular and developmental mechanisms that govern plant embryogenesis.
The entire dataset (Supplemental Table S1 Arabidopsis embryogenesis will contribute to future plant embryo research.

Plant Materials and Growth Conditions
Arabidopsis thaliana ecotype Columbia was used in this study. Plants were grown under 16h light / 8h dark photoperiod with constant temperature of 22 0 C at 120 µ E m -2 s -1 light intensity.

21
Embryo isolations were performed as described in Fig. 1  RNAs were used to make probes for the microarray experiments as well as for qRT-PCR analysis (Supplemental Table S22). Details of microarray experiment are provided in supporting information -methods.

Constructs used for functional analysis
Open reading frames of selected genes were cloned, sequenced for confirmation, and the dominant-negative constructs (p35S:En-gene-Etag) (see support information-methods) and over-expression constructs (p35S: gene) were made and transformed into Arabidopsis wild type Col as described in (Yang et al., 2009).

qRT-PCR
Embryo isolation was performed as described in Fig. 1 and total RNA samples were isolated from different embryo stages as described in RNAqueous-Micro kit (Ambion, Catalog# 1927) and the respective double strand cDNAs were produced using amplified aRNA following the protocol of MessageAmpTM II aRNA Kit (Ambion, Cat.1751). Gene-specific primers were designed using Primer 3 software. qRT-PCR reactions were performed using the protocol and equipment of Applied Biosystem Step One.

Microscopy
Isolated embryos representing different key stages of Arabidopsis embryogenesis were prepared by clearing in chloral hydrate solution (Christensen et al. 1998) and were viewed with a Leica DMR microscope.

Microarray, network and statistical analysis
Microarray data were processed using Limma package (Smyth, 2004). Metabolic network construction, analysis and randomization tests were followed as described previously (Wang and Purisima, 2005;Tibiche and Wang, 2008).
Details of these analyses are provided in supporting information -methods.    (F) or in the pollen tube (G) but showed expression in the female gametophyte (H). The inserts in the G (1 and 2) showed DAPI staining of sperm nuclei (1) but no GFP signal (2). The GFP signal was also detected in the unfertilized ovule and specifically in the embryo sac nuclei, viz., egg cell, 2 synergids, 2 polar nuclei and 3 antipodals (embryo sac -yellow outline) (H). When the pollen from the GFP reporter line (no detectable expression of GFP) was crossed with wild type as female, the GFP signal was observed in the zygote (yellow outline) and the endosperm nuclei (red stars) (I). In the reciprocal cross where the pollen from wild type was crossed into transgenic reporter line as female, GFP expression was observed in the zygote (J). Bar=0.05mm (A-E, H and I) and 0.01mm (F, G and J).