Modern mass spectrometry-based methods provide an exciting opportunity to characterize protein expression in the developing embryo. We have employed an isotopic labeling technology to quantify the expression dynamics of nearly 6000 proteins across six stages of development in Xenopus laevis from the single stage zygote through the mid-blastula transition and the onset of organogenesis. Approximately 40% of the proteins show significant changes in expression across the development stages. The expression changes for these proteins naturally falls into six clusters corresponding to major events that mark early Xenopus development. A subset of experiments in this study have quantified protein expression differences between single embryos at the same stage of development, showing that, within experimental error, embryos at the same developmental stage have identical protein expression levels.
Introduction: Xenopus laevis as a model organism
The African clawed frog, Xenopus laevis, is an attractive model organism for vertebrate developmental biology. The large size of the oocyte and embryo (∼1.2 mm diameter) provides large amounts of material for transcriptomic and proteomic analyses. In vitro fertilization yields large numbers of synchronized Xenopus embryos that mature outside the mother, again facilitating study.
Transcriptomic analysis has benefited first by the development of hybridization arrays and more recently by the use of next-generation sequencing technology (Robert, 2010; Lee-Liu et al., 2012) and has led to a growing literature on the dynamics of gene expression in oocytes and through early stages of vertebrate development in Xenopus and other model organisms (Flachsova et al., 2013; Nesan and Vijayan, 2013; Sylvestre et al., 2013; Aanes et al., 2014; Jambor et al., 2015). These studies provide details on biochemical changes that accompany normal fertilization and development, and should provide insight into defects that arise during development and the toxicology accompanying exposure to environmental contaminants.
However, it has become clear that transcript expression may not correlate well with protein expression. This disparity arises early in development before the onset of zygotic transcription when translation of maternal mRNA drives changes in protein expression. The expression of many stored maternal transcripts is activated by cytoplasmic polyadenylation that allows for translation at specific embryonic stages. This lack of correlation between message and protein can obscure important details of development.
Proteomic analysis has lagged behind transcriptomics. Proteomic studies suffer from two issues. First, there is no equivalent to the polymerase chain reaction for proteins, which hinders analysis of proteins expressed at low levels. Second, the chemical properties of proteins vary widely compared with those of mRNA, which leads to serious challenges in protein analysis.
While two-dimensional gel electrophoresis has played an important role in protein analysis, modern proteomic studies rely on liquid chromatography coupled with tandem mass spectrometry for detection and identification of proteins (Aebersold and Mann, 2003). In tandem mass spectrometry, a first stage of mass spectrometry is used to generate a precursor ion spectrum of the components eluting from the chromatograph. A protein or peptide with a specific m/z value is selected and fragmented in a collision cell within the mass spectrometer; the resulting fragment ion mass spectrum is used to identify the peptide or protein.
In top-down proteomics, intact proteins are separated and analyzed by mass spectrometry (Tran et al., 2011). Top-down proteomics provides a detailed view of a protein's structure and is the method of choice when detailing post-translational modifications and identification of protein isoforms (proteoforms). Unfortunately, top-down proteomics provides a number of experimental challenges. Chromatographic separation of complex mixtures of intact proteins can be extremely difficult, mass spectrometric analysis requires instruments with high mass resolution, and data analysis is not straightforward.
In bottom-up proteomics protein mixtures are proteolytically digested, typically with trypsin, prior to analysis (Adkins et al., 2002). Tryptic peptides are much easier to separate than intact proteins, and tryptic peptides are particularly easy to analyze with high sensitivity using electrospray ionization-mass spectrometry. Database searching is used to identify peptides based on the fragment ion spectrum, and that information is used to infer the identity of proteins present in the sample (Eng et al., 1994).
Bottom-up proteomics also suffers from challenges. Digestion converts the complex protein sample into a very complex peptide mixture, and two or more stages of separation are typically used before mass spectrometric analysis. These separations often require several days of continuous instrument time. In addition, data analysis, particularly when searching for a number of post-translational modifications, compounds the processing time. Finally, not all peptides from a protein may be detected, so that a protein's post-translational modifications may not be completely characterized. Despite these challenges, bottom-up technologies are used for the vast majority of proteomic analyses.
There is a relationship between the number of identified proteins and the amount of starting material. Milligrams of protein will allow identification of 10 000 or more proteins, analysis of samples of just a few micrograms will identify a few thousand proteins, and the number of identifications decreases as the amount of material decreases (Sun et al., 2013). Consequently, most bottom-up proteomic studies employ model systems that can generate relatively large amounts of proteins.
Conventional proteomic analyses provide a parts list for the sample. Additional technology is required to generate information on the amount of protein present in the sample (Bantscheff et al., 2007). Stable isotopic labeling is commonly used in quantitative proteomic experiments (Ong et al., 2002; Ross et al., 2004; Guo et al., 2007; Boersema et al., 2009). In these experiments, the ratio of protein expression in two samples is determined by incorporating heavy and light isotopes into the two samples. After labeling, the samples are pooled and subjected to chromatographic and mass spectrometric analysis. The relative abundance of the peptides labeled with different reagents is estimated from the parent or fragment ion mass spectrum, depending on the particular isotopic labeling chemistry.
Labels can be incorporated metabolically by use of isotopically labeled essential amino acids in cultured cells (Ong et al., 2002). While useful for cultured cells, SILAC (Stable Isotope Labeling by Amino Acids in Culture) chemistry is not particularly useful in quantifying protein expression in most developmental biology systems, which do not grow in defined media.
Instead, chemical methods can be used to incorporate isotopic labels. In most cases, labels are covalently attached to primary amines, i.e. lysine residues and the N-terminus of peptides, Fig. 1. In the simplest case, normal and isotopically substituted formaldehyde and cyanoborohydride are used to introduce a characteristic isotopic signature to primary amines within the tryptic peptide, which is revealed in the parent ion spectrum as a set of doublets separated by the mass difference between the labels (Boersema et al., 2009). This dimethyl labeling approach is inexpensive and quite simple, and has been used to characterize changes in the phosphoproteome of wild-type zebrafish embryos and zebrafish embryos with morpholino-mediated knockdown of the Fyn/Yes kinases (Lemeer et al., 2008).
Dimethyl labeling has three limitations. First, because of the limited repertoire of isotopic reagents, protein expression changes for only two or three samples are typically compared in a single experiment. Second, incorporation of the isotopic label leads to a mass difference in tryptic peptides of several Daltons, which complicates the parent-ion mass spectrum in proportion to the number of isotopic channels, decreasing the number of peptide spectral matches that define the bottom-up analysis. Third, the isotopic substitution can lead to subtle shifts in chromatographic retention times, which can introduce modest differences in ionization efficiency for the two forms of the peptide.
A second class of labels uses sophisticated chemistry to introduce an isobaric label onto the peptide. These isobaric tags are designed to label four, eight, or more different samples with unique isotopic signatures; each signature is called a mass channel (Ross et al., 2004). Figure 2 presents an example of an iTRAQ reagent. iTRAQ reagents consist of three groups. The succinimidyl ester functional group reacts with primary amines to covalently attach the label to the peptide. The isobaric tag portion of the molecule is designed to incorporate a constant mass to peptides. The isobaric group consists of two parts. In the 4-plex version of the reagent, the reporter group is isotopically substituted to add between 114 and 117 Da to the label. The balance group is also isotopically substituted such that the isobaric tag has constant mass (145 Da) for all forms. The four different forms of the iTRAQ reagent are used to label four different samples. The isobaric tag adds a constant mass to the parent ion spectrum of peptides in the four samples.
The isobaric tag is cleaved during fragmentation, generating a fragment ion spectrum that is identical to the fragment ion spectrum from an unlabeled peptide, facilitating database searching. Figure 3 presents a typical tandem mass spectrum generated from a tryptic peptide labeled with an 8-plex iTRAQ reagent. The tandem spectrum is used for database searching. Fragmentation frees the reporter group, generating a set of peaks at low mass corresponding to the reporter group used to label the four samples. The relative intensity of these peaks is proportional to the relative abundance of the samples labeled with the four different variants of the iTRAQ reagent. The signature at m/z = 113–121 is used to estimate the abundance of the peptide in eight samples.
In dimethyl, iTRAQ, and other isotopic labeling methods, an accurate measure of differences in a protein's abundance between samples is found by integrating the isotopic signals across a number of tryptic peptides from that protein. However, as one limitation, these methods tend to compress large differences in expression, and expression changes greater than 10-fold are seldom reported using isotopic labeling chemistry. This compression arises from limitations in the isolation of the parent ion for fragmentation; most mass spectrometers isolate peptides within a >1 Da window, and co-isolated ions can contribute to the iTRAQ signal for the target peptide.
Alternatively, label-free chemistry can be employed to estimate protein abundance (Liu et al., 2004; Cox and Mann, 2008). In this case, the number of tryptic peptides identified from a protein and the parent ion intensity can be used to estimate abundance. These label-free methods tend to have wider dynamic range than methods based on labeling chemistry, but also tend to have much poorer precision and accuracy; these methods are best used to characterize gross differences in protein expression between samples. Label-free methods are also less ideal for isoform-specific quantification because their best accuracy arises from integration of several peptides per protein.
We have recently used iTRAQ chemistry to characterize protein expression changes during early stage development of Xenopus laevis embryos (Sun et al., 2014). We were interested in the evolution of protein expression during development, and studied embryos at stages 1, 5, 8, 11, 13 and 21. Stages 8 is the midblastula transition (MBT) for Xenopus that marks the onset of zygotic transcription; protein expression changes before this stage are due to translation of maternal mRNA. Stage 13 is the gastrula-neurula transition, which is immediately followed by the onset of organogenesis.
Figure 4 presents the simplified experimental protocol. Embryos were individually lysed and digested, and each digest was labeled with one of the iTRAQ reagents. The peptides were pooled, separated by two stages of liquid chromatography, detected by tandem mass spectrometry, and analyzed by database searching for peptide identification combined with reporter ion integration for quantification.
Three experiments were performed. In two experiments, single embryos were taken for iTRAQ analysis. We used ∼30 µg of protein from each embryo for these analyses, and these experiments provided information on both the changes in expression during development and the consistency of expression for single embryos at the same stage of development. We used the 8-plex form of iTRAQ chemistry to compare protein expression between eight single embryos.
In our first experiment (E1), tryptic peptides generated from two embryos at stage 1 of development were labeled with the m/z 113 and 114 iTRAQ reagents, two embryos at stage 5 of development were labeled with the 115 and 116 reagents, two at stage 8 were labeled with the 117 and 118 reagents, and two at stage 11 were labeled with 119 and 121 reagents (Fig. 5). These labeled peptides were pooled and subjected to ion exchange prefractionation followed by reversed-phase liquid chromatography and tandem mass spectrometry. The resulting data were subjected to database searching.
Xenopus is a tetraploid organism, which leads to a large amount of gene duplication. In some cases, an identified peptide could come from two or more proteins; these proteins are combined in a single protein group.
Our initial searching algorithm used an incomplete Xenopus genomic sequence. A subsequent search using a more complete genome and improved search algorithms yielded 5757 protein groups, which were determined from 46 117 tryptic peptides and 205 401 spectral matches (Sun et al., unpublished). An average of nearly 10 peptides were identified from each protein, which generated an average of nearly 50% sequence coverage. Figure 6 presents the cumulative distribution of coverage for experiment E3. Over 85% of proteins have >25% coverage, and the mean coverage was ∼50%. Each spectral match provides an independent estimate of the protein abundance from the eight embryos; an average of nearly 45 measurements was made on each protein's abundance. These highly redundant data provide a robust estimate of protein expression differences.
In our second experiment (E2), peptides were labeled from two embryos at stage 1, two at stage 5, two at stage 13, and two at stage 21. We identified and determined the relative expression of over 4000 protein groups in each of these experiments.
By comparing protein expression of duplicate embryos at the same stage of development, we were able to quantify both the precision of our measurement and the heterogeneity of protein expression at each stage of development. It is conventional to inspect the logarithm of expression ratios, rather than the ratios themselves. Figure 7 presents the histogram of the log2 expression ratio between two embryos at stage 1. The log2 distribution is centered at −0.01 with a standard deviation of 0.24, corresponding to a 29% relative standard deviation in expression of the proteins between the two embryos. Similar results were obtained for duplicate embryos at the other stages of development. These symmetrical distributions, centered at log2 ratio of 0, demonstrate that the iTRAQ chemistry does not introduce bias in expression measurements. The tight distribution reflects the high precision of the measurement and the homogeneity of protein expression between two embryos at the same stage of development. We conclude from this data that our experimental protocol produces outstanding precision, and that the protein expressions of single embryos at the same stage of development are experimentally identical.
The third experiment pooled proteins from four embryos. Pooling served two purposes. It increased the amount of protein available for analysis, generating a modest increase in the number of identified proteins. Pooling also averages any heterogeneity in protein expression between embryos. We used four channels of iTRAQ chemistry to label proteins from pooled embryos; four stage 1 embryos were pooled and homogenized. Their tryptic peptides were labeled with the 113 iTRAQ reagent. Similarly, four stage 8 embryos were homogenized, and their tryptic peptides were labeled with the 115 iTRAQ reagent. Four stage 13 embryos were homogenized and their peptides were labeled with the 117 iTRAQ reagent, and four stage 22 embryos were homogenized and their peptides were labeled with the 119 iTRAQ reagent. The peptides were pooled and subjected to the same analysis protocol as in the other two experiments. These data tended to give a higher signal-to-noise ratio than the single embryo experiments, and form the basis of the following discussion. Each experiment generated over 4500 protein identifications, and the combination of the data from all three experiments resulted in identification of almost 6000 protein groups.
Evolution of protein expression during development
We compared protein expression changes at different stages of development. Roughly 75% of the proteins showed no significant change in expression, whereas 25% of proteins (∼1200) showed a significant change in expression. A detailed list of the protein expression levels can be found in the supporting information of our previous work (Sun et al., 2014).
We performed cluster analysis based on the log2 abundance changes. Figure 8 presents a clustergram for the ∼1200 proteins with significant expression change across the development stages. The data are normalized to the stage 1 intensity, which is uniformly colored in the figure. The clustergram shows that the proteome of stages 1 and 8 are most closely related, and that the proteome of stage 21 shows the poorest correlation with stage 1 embryos. This clustering is expected, as later stage embryos express proteins associated with organ development.
The algorithm identified six clusters (clusters 1–6) with significant change, plus the large group of proteins with no significant expression change (cluster 0), as summarized in Fig. 9. Figure 10 presents the cluster profiles across the stages of embryo development.
Clusters 1 and 2 show a decrease in expression during development. Cluster 1 consists of proteins whose expression decreases monotonically across the stages, starting immediately after fertilization. A number of these proteins bind to and modulate repression of maternal mRNA. These proteins include Y box proteins (frgy 2 a/b) and Zygote arrest protein, Maskin, and RAP55. The latter two proteins are components of a complex that is associated with the cytoplasmic polyadenylation element, which controls activation of masked mRNA by addition of a poly(A) tail. The loss of these proteins following fertilization is consistent with the observation that this mechanism of translational control is found largely in oocytes. The apparent leveling off of expression at stage 22 of development is likely an artifact of the limited dynamic range of the iTRAQ chemistry, and the proteins likely continue to show decreased expression at that stage.
Cluster 2 similarly shows a monotonic decrease in expression, but starting at the MBT (stage 8). This stage is associated with the onset of asynchronous cell division. Cluster 2 includes replication factors RexQ4 and Cut5, whose decrease in expression is responsible, in part, for lengthening of the cell cycle and the onset of asynchronous cell division (Collart et al., 2013). The replication licensing factor XCdc6 had a much more pronounced decrease in expression, and we demonstrated that overexpression of this protein can trigger apoptosis.
Clusters 3–6 show increased levels of expression following fertilization. Cluster 3 consists of proteins whose expression monotonically increases following fertilization. Proteins in cluster 3 are encoded initially by maternal mRNA before the MBT and show a smooth transition to zygotic transcripts following the MBT.
Cluster 4 consists of proteins that show a transient expression at the MBT. These proteins presumably are products of maternal mRNA, and their decrease after the MBT is associated with the decrease in maternal transcripts. These embryos are progressing to the gastrula stage where the three primary germ layers are organized. The decrease in protein expression may be a result of more tissue-specific expression as cells differentiate, as exemplified by Stat1 (Turpen et al., 2001).
Cluster 5 consists of proteins whose expression increases at the MBT. In many cases, this increase in expression is correlated with the onset of zygotic transcription at that development stage. Examples include alkaline phosphatase, fructose-1,6-bisphosphatase, fus, lin28a, U2 auxiliary factor 2, and YAP. Other cases are more complicated. Some proteins appear to be translated initially from maternal transcripts whose activation is delayed until the MBT. The BMP signaling agonist, Twisted gastrulation (xTsg) exhibits a smooth increase in expression, which is seemingly accomplished by a transition from decreasing maternal transcripts and increasing zygotic transcripts (Oelgeschlager et al., 2000). Similarly, despite its appearance after the MBT, fibronectin relies on maternal, rather than zygotic transcription, since expression of this protein is not affected by inhibition of zygotic transcription (Lee et al., 1984).
Cluster 6 consists of proteins whose expression increases at the gastrula-neurula transition (stage 13). This transition immediately precedes the onset of organ formation. Proteins in cluster 4 include globin Y, multiple skeletal troponins and myosins, myosin 10 (neural cells), neurofilament protein, SPARC/osteonectin (cilia cells), and Na1-K1-ATPase (kidney).
This clustering procedure provides a valuable means of identifying those proteins whose expression changes are highly correlated during early development. We anticipate, but do not demonstrate, that those proteins with highly correlated expression profiles will have related function. Furthermore, the discordance between mRNA expression and protein expression suggests that reliance on transcript data may lead to erroneous conclusions about biological functions of gene products. Finally, we point out that our data are silent on post-translational modifications during development (McGivern et al., 2009). It is likely that phosphorylation, along with other modifications, add an additional layer of control of protein function.
L.S.: sample preparation, data generation, data analysis, manuscript drafting; M.M.C.: data analysis and manuscript drafting; P.W.H.: data analysis and manuscript drafting; N.J.D.: data analysis and manuscript drafting.
This work was funded by the National Institutes of Health (R01HD084399).
Conflict of interest
We thank Dr William Boggess in the Notre Dame Mass Spectrometry and Proteomics Facility for his help with this project.