Autocatalytic Processing and Substrate Specificity of Arabidopsis Chloroplast Glutamyl Peptidase1[OPEN]

Chloroplast proteostasis is governed by a network of peptidases. As a part of this network, we show that Arabidopsis (Arabidopsis thaliana) chloroplast glutamyl peptidase (CGEP) is a homo-oligomeric stromal Ser-type (S9D) peptidase with both exoand endo-peptidase activity. Arabidopsis CGEP null mutant alleles (cgep) had no visible phenotype but showed strong genetic interactions with stromal CLP protease system mutants, resulting in reduced growth. Loss of CGEP upregulated the chloroplast protein chaperone machinery and 70S ribosomal proteins, but other parts of the proteostasis network were unaffected. Both comparative proteomics and mRNA-based coexpression analyses strongly suggested that the function of CGEP is at least partly involved in starch metabolism regulation. Recombinant CGEP degraded peptides and proteins smaller than ;25 kD. CGEP specifically cleaved substrates on the C-terminal side of Glu irrespective of neighboring residues, as shown using peptide libraries incubated with recombinant CGEP and mass spectrometry. CGEP was shown to undergo autocatalytic C-terminal cleavage at E946, removing 15 residues, both in vitro and in vivo. A conserved motif (A[S/T]GGG[N/G]PE946) immediately upstream of E946 was identified in dicotyledons, but not monocotyledons. Structural modeling suggested that C-terminal processing increases the upper substrate size limit by improving catalytic cavity access. In vivo complementation with catalytically inactive CGEP-S781R or a CGEP variant with an unprocessed C-terminus in a cgep clpr2-1 background was used to demonstrate the physiological importance of both CGEP peptidase activity and its autocatalytic processing. CGEP homologs of photosynthetic and nonphotosynthetic bacteria lack the C-terminal prosequence, suggesting it is a recent functional adaptation in plants.

Chloroplast glutamyl peptidase activity was initially detected in stroma of spinach (Spinacia oleracea) chloroplasts and the native mass of the activity was between 350 and 380 kD; however, the protein was not identified (Laing and Christeller, 1997). The enriched stromal fraction was able to very slowly (.1-2 d) cleave insulin (10 kD) on the C-terminal side of Glu, but neither Rubisco, RNase, nor casein were cleaved. The enriched fraction efficiently cleaved the synthetic peptide carbobenzoxy-Leu-Leu-Glu-naphthylamide after Glu (Laing and Christeller, 1997). A glutamyl endopeptidase activity was also isolated from cucumber (Cucumis sativus) leaves, and partial amino acid sequencing of a 97-kD protein suggested that it was a homolog of Arabidopsis CGEP (Yamauchi et al., 2001). Incubation with a small set of synthetic substrates showed preferential cleavage C-terminal of Glu. Finally, a chloroplast stromal fraction from pea (Pisum sativum) was reported to cleave the recombinant N terminus of LHCII after Glu residues and CGEP was identified as the peptidase (Forsberg et al., 2005). However, the CGEP-enriched stromal fraction could not degrade native LHCII. Collectively, these three studies suggest there is an active stromal glutamyl peptidase in chloroplasts, warranting further characterization. So far, no loss-of-function mutants for CGEP have been described and it is not known if CGEP has any genetic interactions with other chloroplast proteases.
This study provides a comprehensive functional analysis of Arabidopsis chloroplast CGEP addressing both in vitro activities and in vivo physiological relevance. Comparative proteomics and genetic interactions show that CGEP is part of the chloroplast proteostasis network. Both proteomics and coexpression analyses suggest that CGEP is directly or indirectly involved in the regulation of starch metabolism. Tandem mass spectrometry (MS/MS)-based analysis of peptidase activity of recombinant CGEP (rCGEP) incubated with large peptide libraries (using the proteomic identification of protease cleavage sites [PICS] technique; Schilling et al., 2011;Biniossek et al., 2016) show that CGEP specifically cleaves after Glu residues without any observable effects of neighboring residues. CGEP can digest peptides and small proteins, but not larger proteins. Surprisingly, Arabidopsis CGEP autocatalytically cleaves 15 residues from its C terminus both in vitro and in vivo, and based on homology structural modeling and experimentation, this cleavage enhances substrate entry into the catalytic pocket. In vivo complementation demonstrates that CGEP autocatalytic C-terminal processing and CGEP peptidase play functional roles in the chloroplast proteostasis network.

CGEP Is a Highly Conserved Protein in Plants
Phylogeny of 41 CGEP homologs (protease family S9D) across the species tree-of-life showed two major clades-one with photosynthetic eukaryotes, and one with cyanobacteria, proteobacteria (a, b, and g), and flavobacteria (within the FCB group; Fig. 1A; Supplemental Table S1; Supplemental Dataset S1).
Conserved key features include the catalytic triad (Ser-781, Asp-855, and His-889-numbering for Arabidopsis CGEP) and the GGHSYGAF signature sequence for S9D peptidases (Supplemental Fig. S1). We did not identify CGEP homologs in the sequenced gymnosperms (Picea glauca, Picea albies, and Pinus taeda), the glaucophyte Cyanophora paradoxa, nor archaea. Moreover, CGEP homologs were not observed in nonphotosynthetic eukaryotes (e.g. yeast [Saccharomyces cerevisiae], Neurospora spp., humans [Homo sapiens], Drosophila spp.). CGEP is a single gene family in most photosynthetic species, including Arabidopsis, but some angiosperms have two homologs (e.g. cotton [Gossypium spp.] and rice [Oryza sativa]; Supplemental Table S1). Sequence alignment of angiosperm (monocot and dicot) CGEP homologs revealed high sequence identity for most of the protein, with the exception of the predicted chloroplast transit peptides (cTPs) and C-terminal region (Supplemental Fig. S1,A and B). Most of the angiosperm CGEP homologs exhibited a predicted cTP, whereas three others were found to lack the C-terminal portion (Egr2, Mtr2, and Osa2). These missing regions are likely due to incomplete genome sequencing and/or mistakes in the assembly. The phylogeny suggested that CGEP in photosynthetic eukaryotes originates from an early progenitor of plants and algae, but not directly from endosymbiosis with cyanobacteria.

CGEP Forms Homo-Oligomers in Chloroplast Stroma
CGEP was detected in previous proteomics studies of leaf and chloroplast samples in both maize and Arabidopsis (Huang et al., 2013), as can also be viewed through the Plant Proteome Database (PPDB) at http://ppdb.tc.cornell.edu/. Using a newly generated anti-CGEP polyclonal serum, SDS-PAGE and immunodetection showed that Arabidopsis CGEP is ;100 kD and localizes nearly exclusively to chloroplast stroma and is not present in the chloroplast membrane fraction (Fig. 1B). The native size of CGEP in Arabidopsis was determined by size exclusion chromatography of stromal proteome, followed by SDS-PAGE and immunodetection (Fig. 1C), as well as native PAGE (Fig. 1D). This showed that native Arabidopsis CGEP forms complexes, consistent with observations for the cucumber homolog (Yamauchi et al., 2001). To determine if CGEP forms stable interactions with other proteins, extensive affinity purifications with anti-CGEP serum using isolated stromal proteome samples from Arabidopsis were carried out (Supplemental Text S1). The experiments were all successfully highly enriched for CGEP (based on sequence coverage and protein scores), but enrichment analysis did not identify obvious interacting proteins (see Supplemental Text S1). We therefore conclude that Arabidopsis chloroplast CGEP is a soluble stromal protein accumulating as homooligomers. Previously, we determined that the relative abundance of CGEP in rosette leaves was in the same range as the CLPR subunits, CLPB3 and cpHSP90 (Zybailov et al., 2008).

In Vivo CGEP Loss-of-Function Mutants and Genetic Interaction of CGEP with the Chloroplast CLP Protease
To determine the in vivo CGEP function, we identified three T-DNA Arabidopsis mutants, namely cgep-1 (SAIL_574_D03), cgep-2 (SALK_066117), and cgep-3 (SAIL_589_G08), from the Arabidopsis Biological Resource Center (Fig. 2, A and B). Reverse transcription PCR (RT-PCR) analysis showed that expression of the CGEP gene was undetectable in all three mutants ( Fig. 2C) and, consistently, immunoblotting of total leaf extracts showed a complete loss of CGEP protein accumulation (Fig. 2D). Thus, cgep-1, cgep-2, and cgep-3 were considered null mutants for CGEP. None of these lines showed obvious visible phenotypes as compared to wild type when grown under standard growth-chamber conditions (Fig. 2B) or after high-light or drought-stress conditions (Supplemental Fig. S2,A and B). To further probe for a physiological role of CGEP in growth and development, we crossed the cgep-2 allele with partial loss-of-function mutants in the essential CLP protease system. cgep-2 was crossed with clpr2-1, a partial loss-offunction CLPR2 T-DNA mutant (Rudella et al., 2006), and the double mutant clpt1-2 clpt2-1 . Homozygous progeny of these crosses was identified in the F2 populations ( Fig. 2E; Supplemental Fig. S3). The visible growth phenotypes indicated strong genetic interactions between CGEP and clpr2-1 (Fig. 2E) as well as the clpt1 clpt2 double mutant (Supplemental Fig. S3). Immunoblotting confirmed the complete lack of CGEP in all plants with the cgep-2 allele (Fig. 2F). The rosette diameters and fresh weights of rosette leaves of cgep-2 clpr2-1 and cgep-2 clpt1-1 clpt2-1 plants were significantly smaller than clpr2-1 and clpt1-1 clpt2-1 plants, Figure 1. Phylogenetic analysis and subcellular localization of CGEP A, Phylogenetic tree of 41 CGEP homologs showing two distinct clades: one with photosynthetic eukaryotes (mono-and dicotyledons, moss, a lycopod, and red and green algae) and one with only photosynthetic and nonphotosynthetic prokaryotes. Within these prokaryotes, CGEPs are found in many cyanobacteria (marked in pink) and many Gram-negative bacteria (marked in red; proteo-and flavo-bacteria). CGEP homologs were not obvious in other major branches of the tree-of-life (e.g. archaea, fungi, animals). Bootstrap values are shown. A complete list with the explanations of the abbreviations is provided in Supplemental Table S1 and complete alignment is given in Supplemental Dataset S1. B, CGEP localized to Arabidopsis stroma determined by immunoblotting. CPN60 and LHCII were used as marker proteins for stroma and thylakoid, respectively. Ponceau-stained blot (before immunodetection) is shown below. C, Gel filtration of Arabidopsis stromal proteome showing that CGEP accumulates in complexes between ;150 and 600 kD. The Rubisco holocomplex at 550 kD is marked by an asterisk. D, Native PAGE of Arabidopsis chloroplast stroma shows CGEP dimerization in wild-type plants. The arrow indicates the Rubisco holocomplex at 550 kD. cgep-2 refers to an Arabidopsis null mutant for CGEP. respectively (Fig. 2G). By contrast, homozygous progeny (F2) of crosses of cgep-2 with var2-1, a partial loss-offunction allele of thylakoid FTSH2 (Chen et al., 2000), did not show an obvious synergistic visible growth or developmental phenotype (Supplemental Fig. S3). These results show that CGEP does play a specific role in the chloroplast proteostasis network.

Proteome Phenotype of cgep-2
To determine if other chloroplast proteases were upor downregulated in cgep, wild type and cgep were probed by immunoblotting for the abundance of stromal DEG2 (Nishimura et al., 2016) as well as thylakoidlocalized FTSH2 and FTSH5 (Nishimura et al., 2016) and stress-responsive SPPA (Lensch et al., 2001;Wetzel et al., 2009). Accumulation levels of these proteases were unchanged in cgep as compared to wild-type plants (Fig. 2H). To more comprehensively understand the physiological response to the loss of CGEP, we compared the isolated chloroplast stromal proteomes of wild type and cgep in three biological replicates using SDS-PAGE followed by in-gel tryptic digests and identification and quantification by MS/ MS (Supplemental Fig. S4). We previously successfully applied this workflow (Friso et al., 2011) to characterize other chloroplast protease mutants Nishimura et al., 2013. Proteins were assigned to subcellular locations and functions based on updated curated information in the PPDB. The proteomics experiment identified 591 proteins of which 553 were localized to plastids (Supplemental Dataset S2). To obtain a general view of the physiological proteome phenotype, we calculated protein investments for key chloroplast functions across three main categories: carbon metabolism; chloroplast biogenesis and proteostasis; and other metabolic pathways ( Fig. 3;  Supplemental Fig. S4). This demonstrated statistically significant altered investments in protein (un)folding (up 25%), plastid 70S ribosomes (up 53%), starch synthesis (down 39%), and glycolysis (down 32%). The  Supplemental Table S4. B, Homozygous cgep alleles and wild type (wt) were grown on soil under a 18-h light/6-h dark regime with 130-mmol photons m 22 s 21 at 22°C. No obvious visible phenotypes were detected. Images of additional phenotyping on agar plants with and without the translational inhibitor CAP and soil grown under various abiotic stress conditions are shown in Supplemental Figure S2. C, mRNA levels of CGEP in wild-type and homozygous cgep-1, cgep-2, and cgep-3 as determined by RT-PCR. ACTIN2 was used as control. D, SDS-PAGE and immunoblot with anti-CGEP serum of total soluble leaf extracts shows that the three independent cgep lines completely lack accumulation of CGEP. Ponceau-stained blot is shown below. E, CGEP genetically interacts with Clp protease complex, as shown by visible growth and developmental phenotypes of the cgep clpr2-1 double mutant. F, SDS-PAGE and immunoblot with anti-CGEP serum of leaf extracts shows that the three cgep alleles completely lack accumulation of CGEP. Ponceau-stained blot before immunodetection is shown below. G, Quantitative measurement of fresh weight and rosette diameter in single (clpr2-1), double (cgep-2 clpr2-1; clpt1 clpt2), and triple (cgep-2 clpt1 clpt2) mutants. Asterisks indicate statistical significance using a t test (*P , 0.05). Additional images of mutants are shown in Supplemental Figure S3. H, Accumulation of thylakoid proteases SPPA, FTSH2, and FTSH5, and stromal DEG2 and CGEP, in wild type and cgep-2. This shows that the accumulation levels of these thylakoid proteases and DEG2 are unchanged cgep-2.
overall investments in other chloroplast metabolic pathways (e.g. Calvin-Benson cycle, N-and S-metabolism, amino acid metabolism) were not significantly affected ( Fig. 3A; Supplemental Fig. S4). At the individual protein level, 12 proteins, all plastid-localized, showed significant (false discovery rate , 5% and P , 0.01) differential accumulation between wild type and cgep (Fig. 3B). One of these was CGEP, identified with 124 matched MS/MS spectra across the three wild-type samples but never in cgep-2. Strikingly, five of these proteins are directly or indirectly involved in starch metabolism, as will be discussed below. The others are involved in the Calvin-Benson Cycle and photorespiration (SBPase, SFBA-1, and PGP1/2), fatty acid metabolism (acetyl-CoA synthetase), amino acid metabolism (ABERRANT GROWTH AND DEATH 1 and 2), and a protein associated with abiotic stress (UOS1like [similar to pea UV-B-and-ozone similarly regulated protein]; for full protein names, see legend to Although we detected all major players in the stromal (un)folding machinery including CLPB3, HSP90, cpHSP70-1/2, the group of CPN10/20/60 proteins, and the HSP70 nucleotide exchange factors GRPE-1/2 (Supplemental Dataset S2), the increased protein investment in the (un)folding machinery was mostly due to increased amounts of the abundant CPN60 chaperones and peptidyl-prolyl isomerase ROC4. Within the function proteolysis, we identified 26 stromal proteases and protease chaperones (Supplemental Dataset S2), including the complete CLPRT core complex (11 proteins) and all three CLP chaperones (C1, C2, and D), dual targeted organellar oligo peptidase (OOP), PREP1, and PREP2, stromal processing peptidase (SPP), DEG2, and several other less studied proteases. None of these proteases showed significant differential accumulation between wild type and cgep, with the exception of CGEP. Furthermore, the abundance ratio for cgep/wild type of the complete stromal CLPPRTCD system (14 proteins) was 1.05, indicating that the CLP system was not up-or downregulated in response to loss of CGEP. The 53% increase in 70S ribosomes was due to an increase in both the 30S (41% up) and 50S (81% up) particles. Other functions within biogenesis and proteostasis, including proteins involved in RNA metabolism or translation, were not significantly affected (Fig. 3A).
Within plastid glycolysis, we detected seven stromal proteins, which were all lower in cgep than in wild type (Supplemental Dataset S2), explaining the significant 30% decrease. The two most abundant enzymes were the well-characterized Glc-6-P isomerase and plastid phosphoglucomutase (starch free mutant1), driving the conversion of the Calvin cycle intermediate Fru 6-P to Glc-1 P immediately upstream of starch biosynthesis. The decrease of plastid phosphoglucomutase in cgep was statistically significant (Fig. 3B). The proteome analysis identified 21 proteins involved in starch metabolism, of which 11 function in starch synthesis and 10 function in starch degradation (Supplemental Fig.  S4D; Supplemental Dataset S2). The starch metabolic enzymes were carefully annotated based on the most recent reviews and experimental studies (Goren et al., 2018;Smith and Zeeman, 2020) and The Arabidopsis Information Resource (TAIR10). Except for isoamylase2 (DBE1), all . Protein mass investment based on normalized adjusted spectral counts in specific chloroplast stromal proteins and functions in the wild type (wt) and cgep-2. Wild type and cgep-2 are represented by black and blue bars, respectively. Error bars correspond to the SDs across three biological replicates. A, Protein mass investment in specific chloroplast functions in the wild type and cgep-2. Asterisks indicate significant differences (*P , 0.05 and **P , 0.01). B, Proteins in wild type and cgep-2 that are significantly affected in cgep-2 compared to wild type. CGEP. AGD1 and AGD2, ABERRANT GROWTH AND DEATH1 and ABERRANT GROWTH AND DEATH2; ACS, acetyl-CoA synthetase; RFP, UOS1like (similar to pea UV-B-and-ozone similarly regulated protein); SFBA-1, Fru-bisphosphate aldolase-1; SBPase, sedoheptulose-bisphosphatase; PGP-1,2, 2-phosphoglycolate phosphatase-1,2; PPA6, inorganic Pyrophosphatase6; SBEII-3, starch branching enzyme class II-3; BAM2, beta-amylase2; DPE1, disproportionating enzyme1 (also D-enzyme). Aminotransferase, classes I and II (AGD2). Significance threshold was ,0.01 and false discovery rate (FDR) , 0.05. enzymes involved in starch synthesis were lower in cgep-2, with the reduction in inorganic pyrophospha-tase6 and starch branching enzyme class II-3 being statistically significant. Within starch degradation, beta amylase2 and disproportionating enzyme1 were both significantly reduced in cgep-2 (Fig. 3B).

mRNA-Based Coexpression Analysis
In our recent coexpression peptidase network (Majsec et al., 2017), CGEP forms a tight coexpression module (Module VI) with two other peptidases, namely dual-localized plastid/mitochondria OOP and stromal Met aminopeptidase 1B. This module contains 70 nonredundant genes making 81 edges and a relative enrichment in starch metabolism (7% compared with 0.8% for the whole network). However, this was a socalled "forced network" using only the plastid and mitochondrial peptidases and their auxiliary proteins as bait and limiting the number of coexpressors to the top 20. Therefore, we extracted the top-100 (based on mutual-rank [MR] values) coexpressed genes for CGEP from the coexpression database ATTEDII (http:// atted.jp/) and evaluated enrichment for subcellular location and functions (Supplemental Dataset S3). This showed that proteins involved in starch metabolism, glycolysis, and the tricarboxylic acid cycle were significantly (hypergeometric test) overrepresented (30%, 11%, and 12%, respectively) when weighing for functional bin size (Supplemental Dataset S3). To better put these data in perspective, we also evaluated the top-100 coexpressors for several other stromal peptidases, namely CLPP5, CLPR1, and DEGP2, as well as the dual-targeted chloroplast/mitochondrial OOP, PREP1, and PREP2 (Supplemental Dataset S3). Also, OOP showed enrichment for starch metabolism (26% of weighted bin distribution) but not glycolysis or the tricarboxylic acid cycle. The coexpression of CGEP with starch metabolism was even more pronounced for the top-50 and top-20 coexpressing genes, increasing enrichment to 52% (top 50) and 63% (top 20), whereas enrichment decreased for OOP to 16% (top 50) and 13% (top 20). A closer look at the CGEP coexpressors shows that the coexpression of CGEP with starch degradation extends into the cytosolic conversion of maltose to Suc, as evidenced by the coexpression of disproportionating enzyme2, heteroglycan phosphorylase2, and Suc P synthase 1F. The coexpression pattern strongly suggests that the function of CGEP is at least in part associated with starch metabolism (see "Discussion").

CGEP Is an Active Peptidase that Can Degrade Small Proteins and Peptides
To probe the catalytic activity of CGEP, CGEP was expressed as an N-terminal GST-fusion protein in Escherichia coli and then purified over glutathioneagarose beads (rCGEP). As a negative control for CGEP activity, we generated the catalytic-site mutant S781R (rCGEP-S781R). Recombinant proteins were incubated with bovine serum albumin (BSA; 67 kD), b-casein (25 kD), and insulin (10 kD) as substrates. After 6-h incubation, rCGEP partially degraded b-casein ( Fig. 4A) and completely degraded insulin (Fig. 4B), but rCGEP-S781R could not degrade either substrate (Fig. 4, A and B). Furthermore, BSA could not be degraded by active rCGEP (Fig. 4C). A recombinant Arabidopsis truncated CGEP consisting of just the C-terminal portion (773-961) including the active site residues (S781, D855, and H889) could not degrade insulin, indicating that the intact protein is required for proteolytic activity (Fig. 4D).

Autocatalytic C-Terminal Cleavage of CGEP In Vivo and In Vitro
We noticed that the proteolytically inactive rCGEP-S781R migrated at a slighter higher mass on the SDS-PAGE gels than rGCEP (Fig. 5A). Comparing the peptide sequence coverage of both active and inactive rCGEP from in-gel trypsin digests and MS/MS analysis, we found a clear difference at the C-terminal . Proteolytic assays of Arabidopsis rCGEP and rCGEP-S781R. A to C, b-casein, insulin, and BSA degradation assays by rGST-CGEP and rGST-CGEP-S781R. rCGEP degraded casein (A) and insulin (B), but not BSA (C). CGEP-S781R could not degrade casein (A) or insulin (B). rCGEP variants were incubated with substrate proteins at 37°C for 0 h or 6 h, then denatured by SDS solubilization buffer and loaded onto SDS-PAGE gels for visualization with Coomassie Brilliant Blue. For all reactions, molar ratio between CGEP and substrate was 1:1. D, Recombinant Arabidopsis CGEP, consisting of just the C-terminal region portion including the active site residues (773-961) fused to the maltose-binding protein, cannot degrade insulin. MW, molecular weight.
portion of CGEP. The most C-terminal-detected tryptic peptide in rCGEP-S781R was K↓V 936 STGTGGGNPEF-GEHEVHSK 954 (the arrow indicates the tryptic cleavage site) whereas the most C-terminal peptide in rCGEP was the semitryptic peptide K↓E 928 GSDADKVSTGT-GGGNPE 946 ( Fig. 5B; Supplemental Table S2). Subsequent inspection of CGEP sequence coverage from in vivo Arabidopsis samples across many previous experiments on a range of wild-type Arabidopsis leaf and chloroplast samples (e.g. Zybailov et al., 2008;Olinares et al., 2010), as viewed through the PPDB, showed that the most C-terminal coverage observed in vivo came from the semitryptic peptide K↓E 928 GS-DADKVSTGTGGGNPE 946 , similar to the most C-terminal peptide observed for rCGEP (Fig. 5B). This was confirmed with Arabidopsis in vivo samples by MS/MS analysis of CGEP enrichment through immunoprecipitation (using CGEP serum) of the chloroplast stromal proteome of wild-type plants (  Table S3). Inspection of the sequence alignment (Supplemental Fig. S1B) and sequence logo of CGEP homologs in dicotyledons showed a strong conservation of the residues A[S/T]GGGXPE 946 (Fig. 5D). The in vitro and in vivo data, together with this sequence conservation, strongly suggest autocatalytic processing C-terminal of E946, hence removing 15 residues (FGEHEVHSKLRRSLL) from the C terminus.
We observed a C-terminal Trp (W907 in Arabidopsis CGEP) conserved across all photosynthetic CGEP homologs (Supplemental Fig. S1, B-D), thus providing a good reference for comparing the C-terminal extension shows that rGST-CGEP-S781R is a few kilodaltons larger than GST-CGEP. B, CGEP peptides identified by MS/MS from rGST-CGEP and rGST-CGEP-S781R projected on the C-terminal portion of CGEP primary amino acid sequence. The identified peptide sequences based on MS/MS analysis are shown in red (see Supplemental Table S2). As can also be viewed in Plant Proteome DataBase, we also identified the peptide PEFGEHEVHSK a total of 49 times across three independent experiments (experiments 2221, 2222, and 2223) from stromal samples of cgep-2, but not in the wild-type samples. C, CGEP peptides identified by MS/MS from endogenous in vivo Arabidopsis CGEP matching to the C-terminal portion of CGEP primary amino acid sequence. CGEP was immunoprecipitated with CGEP antiserum from chloroplast stromal proteome isolated from wild-type plants (see Supplemental  Table S3, and all identified peptides are projected on the primary sequence in Supplemental Figure S5. D, Sequence logo of C-terminal 17 residues generated from CGEP proteins of 11 dicot species shows strong conservation of the residues (A[S/T]GGGXPE) immediately upstream of E946.
across CGEP homologs (see Supplemental Table S1). Using this reference, the C-terminal extension in Arabidopsis is 55 amino acids long, with the autocatalytic processing trimming this by 15 residues to 39 residues. The monocots also have C-terminal extensions but it diverged from the extension in the dicotyledons (Supplemental Fig. S1B). Green algae generally have shorter C-terminal extensions between 10 and 30 residues, whereas moss and lycopod have 30 and 36 residues, respectively (Supplemental Fig. S1D); very little conservation is found between them. Finally, cyanobacterial CGEP homologs have only few residues (9-11 amino acids) beyond this conserved tryptophan, thus removing the need for autocatalytic cleavage (Supplemental Fig. S1C). We discuss the position of the C-terminal extension further below ("Homology Model of the CGEP Monomer and C-Terminal Cleavage").

CGEP Cleaves C-Terminal of Glu, Independent of Neighboring Residues, as Demonstrated by PICS
Previous reports with a very limited set of synthetic peptides and cleavage sites in a few model substrates (Yamauchi et al., 2001;Forsberg et al., 2005), and our observed C-terminal autocleavage described above, indicate that CGEP can cleave the peptidyl bond immediately C-terminal of Glu residues. These data do not determine if Arabidopsis CGEP can also cleave after other residues or if there are other cleavage determinants, sometimes referred to as "subsite cooperativity" (Ng et al., 2009). We also wanted to better resolve the upper and lower substrate size limits. Therefore, we applied the PICS method or variations thereof (Schilling et al., 2011;Biniossek et al., 2016). For each experiment, we incubated rCGEP and rCGEP-S781R (as negative control) with very large (.100,000) peptide libraries and compared the resulting peptides by MS/ MS analysis. Complementary peptide libraries were generated by digesting total soluble Arabidopsis leaf proteomes with the commercial peptidases trypsin (cleaving C-terminal of Lys/Arg), LysC (C-terminal of Lys), or GluC (C-terminal of Glu).
In the first experiment outlined in Figure 6A, all three types of peptide libraries were first dimethylated to block and chemically mark all primary amino groups (N-termini and Lys side chains) and these libraries were then incubated with active rCGEP or catalytically inactive rCGEP-S781R. The newly rCGEP-generated amino termini, neo-N-termini, were reacted with a cleavable amine cross linker containing a biotin moiety. The biotin tag was then used for affinity enrichment of these neo-N-terminal peptides for subsequent nanoscale liquid chromatography (nano-LC)-MS/MS analysis. The number of matched MS/MS spectra per unique peptide was summed for each and the list collapsed to identify cleavage events identified in the CGEP but not in the CGEP-S781R-treated samples. We observed 324 peptides (2,157 MS/MS; trypsin library), 197 peptides (863 MS/MS; lysC library), and 65 peptides (215 MS/MS; GluC library) that were specific to rCGEP (absent in CGEP-S781R) and observed by at least three MS/MS spectra (Supplemental Dataset S4). A quantity of 89% (trypsin library), 96% (LysC library), and 82% (GluC library) of these specific peptides resulted from cleavage after Glu (Supplemental Dataset S4). Figure 6B visualizes this substrate cleavage Figure 6. PICS Experiment 1 to determine the peptidase activity of CGEP. A, Workflow of PICS Experiment 1. The MS/MS results are compiled in Supplemental Dataset S4. B, Sequence logo and iceLogo plots for CGEPspecific cleavage of peptide libraries from total soluble leaf extracts reacted with rCGEP or rCGEPS781R, P10 through P109. Peptides absent in control (rCGEPS781R, active site abolished) but with at least three spectral counts in CGEP samples were used. For the iceLogo plots, the weighing was done using the amino acid composition of the complete predicted Arabidopsis proteome (TAIR10); percent difference shown, P 5 0.01. C, Examples of relatively large peptides generated by CGEP glutamyl peptidase activity. The matched proteins and charge state (z) of the peptides are indicated.
preference of CGEP as determined from these digested libraries using sequence logos (https:// weblogo.berkeley.edu/logo.cgi) or iceLogos (https:// iomics.ugent.be/icelogoserver/) after weighing for the amino acid composition of the proteome. This demonstrates strong enrichment for Glu in the P1 position, reflecting the specificity of CGEP. There was very little sequence preference or avoidance up-or downstream of the P1 residue, except for a weak (10%) enrichment (P 5 0.01) for Gly in the P19 position (Fig. 6B). The GluC library created the least number of CGEP cleavages, which is logical because GluC also cleaves C-terminal of Glu, but the peptides in the GluC library did include many missed cleavages, hence allowing for additional cleavages by CGEP. The largest peptides specifically generated by rCGEP (absent in rCGEP-S781R) from the trypsin and LysC libraries were two different peptides 29 residues in length (observed 16 times; z 5 3 1 ; observed six times; z 5 2 1 ); both these large peptides were generated by cleavage C-terminal of Glu (Fig. 6C).
In a second, complementary experiment summarized in Figure 7A, unmodified peptide libraries (trypsin and GluC) were incubated with CGEP or CGEP-S781R for digestion. Subsequently, CGEP-or CGEP-S781R-digested peptides were dimethylated with CH 2 O ("light" formaldehyde) or CD 2 O ("heavy" formaldehyde) followed by mixing in equal proportions to allow for direct comparison by nano-LC-MS/MS. We only considered peptides observed in both replicates (Supplemental Dataset S5). We observed 142 (579 MS/MS) and 79 (293 MS/MS) cleavage events that were specific to rCGEP (observed by at least two MS/MS spectra and absent in CGEP-S781R) in libraries made with trypsin and GluC, respectively. Figure 7B visualizes the substrate cleavage preference of CGEP as determined from these digested libraries using sequence logos or iceLogos after weighing for the amino acid composition of the proteome. Similar to what was indicated in the previous experiment (Fig. 6), this demonstrates the strong enrichment for Glu in the P1 position reflecting the specificity of CGEP. There was very little sequence preference up-or downstream of the P1 residue, showing that CGEP cleaves C-terminal of Glu irrespective of the surrounding residues (Fig. 7B). We then carefully evaluated to what extent the specific rCGEPgenerated peptides resulted from N-or C-terminal exo-glutamyl peptidase or endo-glutamyl peptidase activity. Figure 7C shows three examples for each of these activities. We note that exo-peptidases are defined as peptidases that can cleave peptidyl bonds one, two, or three residues away from the N-or C terminus, thus releasing one amino acid, di-, or tripeptides, respectively. We observed examples for all N-and C-terminal exo-glutamyl peptidase activity as illustrated in Figure 7C. Most of the rCGEP activity resulted from  ). B, The iceLogo plots for CGEP-specific cleavage of peptide libraries from total soluble leaf extracts reacted with rCGEP or rCGEPS781R. Peptides absent in control (rCGEPS781R, active site abolished) but with at least two spectral counts in CGEP samples were used, P10 through P109. For the iceLogo plots, the weighing was done using the amino acid composition of the complete predicted Arabidopsis proteome (TAIR10); percent difference shown, P 5 0.01. Most CGEP-specific cleavages resulted from endo-glutamyl-peptidase activity. C, Examples of N-and C-terminal exo-glutamyl-peptidase activity (releasing one amino acid residue, dipeptide or tripeptide), as well as endo-peptidase activity (cleavage at least three residues away from the N-or C terminus) from the trypsin library's digest. The number of matched MS/MS spectra for each peptide in the rCGEP or cGEP-S781R protease is listed. (1) AT1G06680.1, (2) AT1G20340.1, (3) AT1G23310.1, (4) AT5G38410.1-C term protein, (5) AT2G47390.1, (6) ATCG00490.1, (7) AT1G09340.1), (8) AT1G06680.1, and (9) AT1G13930.1. Figure 7C. The largest peptide specifically generated by rCGEP (absent in rCGEP-S781R) from the trypsin library had 43 residues and was generated from a 49residue, full tryptic peptide, as specifically observed in rCGEP-S781R (Fig. 7C).

CGEP without C-Terminal Autocleavage Is Proteolytically Active but Limited in Substrate Size
We next asked if C-terminal autocleavage of CGEP is required for its proteolytic activity in vitro or in vivo. As described above ("Autocatalytic C-Terminal Cleavage of CGEP In Vivo and In Vitro" section), in vitro and in vivo analysis identified residue E946 as the most C-terminal residue in CGEP (Fig. 5, B and C). We generated a CGEP construct, named CGEP-E946A-E949A-E951A (CGEP-C2), in which C-terminal Glu residues E946, E949, and E951 were mutated into Ala residues. In addition, we also created a CGEP construct in which E926 and D931 were changed into Ala residues (CGEP-C1). Together, these two constructs mutated every Glu in the C-terminal extension (Fig. 8A). For in vitro testing, CGEP-C1 and CGEP-C2 were produced with an N-terminal GST fusion for affinity purification. These GST-CGEP fusion proteins were overexpressed in E. coli and affinitypurified, run out on an SDS-PAGE gel along with GST-CGEP and GST-CGEP-S871R, and followed by Coomassie staining or immunoblotting with anti-CGEP serum (Fig. 8B). This showed that GST-CGEP and GST-CGEP-C1 have a lower molecular mass than GST-CGEP-S871R and GST-CGEP-C2, indicative of the lack of C-terminal processing in the latter two proteins. Indeed, MS/MS analysis confirmed the cleavage after E946 in CGEP-C1 (as in CGEP) but not in CGEP-C2 (as in CGEP-S781R; Supplemental Table  S2). Previously, we showed that rCGEP can completely degrade insulin (10 kD) as well as b-casein (25 kD), albeit with lower efficiency (Fig. 4, A and B). To determine protease activity of CGEP with an unprocessed C terminus, rGST-CGEP-C2 was incubated for 6 h at 37°C with insulin or casein. This Figure 8. Analysis of C-terminal variants CGEP-C1 and CGEP-C2 using rGST-CGEP fusion proteins. A, C-terminal sequence of CGEP and the CGEP-1 and CGEP-C2 variants. W907 is the conserved residue across CGEP homologs in photosynthetic organisms, E929A and D931S are the mutations in CGEP-C1 and E946A, and E949A and E951A are the mutations in CGEP-C2. The arrow indicates the cleavage site of the C-terminal prosequence in wild type (in vitro and in vivo) as shown in Figure 5. Peptides matched for the C-terminal regions are provided in Supplemental Table S2. B, Comparison of protein size of rGST fusion variants of CGEP (wild type, S781R, C1, and C2) shows that rCGEP-S781R and rCGEP-C2 are a few kilodaltons larger than rCGEP and rCGEP-C1, indicating their lack of autocatalytic C-terminal processing. Top shows the Coomassie-stained gel; middle shows the Ponceau-stained blot; and the bottom shows the immunoblot with anti-CGEP serum. C, Degradation assay of insulin and casein using rGST-CGEP-C2 showed that GST-CGEP-C2 can completely degrade the 10-kD insulin, but it cannot degrade the larger 25-kD b-casein. Incubation times (h) are shown.
showed that GST-CGEP-C2 completely degrades insulin but not casein (Fig. 8C), suggesting that the unprocessed C terminus limited access to the active site.

Homology Model of the CGEP Monomer and C-Terminal Cleavage
To better understand CGEP, the possible significance of the C-terminal autocleavage, as well as substrate interactions and size limitations, a 3D structural model was constructed based on the mature CGEP (residues 63-961; predicted cTP removed). The best scoring threading template was the crystal structure of the S9B dipeptidyl aminopeptidase IV in the Gram-negative bacterium Stenotrophomonas maltophilia (PDB:2ECF; Nakajima et al., 2008). A sequence alignment with the predicted secondary structures of this bacterial protein and CGEP from Arabidopsis, Brassica rapa, and Populus trichocarpa is shown in Supplemental Figure S5. The 3D homology CGEP model shows a typical b-propeller domain (upper) and a/b-hydrolase domain (lower) containing the catalytic triad of S-D-H residues (Fig. 9A). The active site is partially accessible through a shallow cavity that is ;15 Å wide and at most 10 Å high, visible from the front view. A very narrow cavity (,4 Å) can also be seen from the top view, extending through the center of the b-propeller domain (Supplemental Fig.  S6A); this feature was also noted in other S9 peptidases and is thought to be too narrow for a substrate to pass through without substantial rearrangement of tertiary structure (Tsirigotaki et al., 2017). The structures of CGEP (rainbow colors) and Dipeptidyl Aminopeptidase IV (gray) were overlaid, showing close alignment of a-helices and b-sheets throughout, but little to no alignment around the mouth of the central cavity at the N-and C-terminal regions (Fig. 9B). The reliability confidence score (C-score) of 22.29 for the Arabidopsis model is relatively low, largely due to uncertainty for the N-terminal region. Therefore, we generated a second model (based on the same structure) for an N-terminally truncated CGEP (residues 387-961) that provided a better C-score of 20.93 (Fig. 9, C and D; Supplemental Fig. S6, B and C). This structure CGEP showed that the C-terminal stretch of 15 residues, which is autocatalytically removed, is in close proximity to the active site, and its removal may increase accessibility to the active site ( Fig. 9E; see "Discussion").

C-Terminal Autocatalytic Cleavage In Vivo
To verify autocleavage in the C terminus in vivo, we generated four 35S-driven C-terminally tagged CGEP variants and transformed these transgenes into cgep-2. These transgenes were CGEP-STREPII (wild-typelike), CGEP-S781R-STREP (catalytically inactive), and CGEP-C1-STREPII (in which either Glu residue E926 and the Asp residues D931 in the C-terminal region of CGEP were changed into Ala residues), and CGEP-C2-STREPII (in which the Glu residues E946, E949, and E951 were changed into Ala residues; Supplemental  Fig. S7). The C-terminal STREPII tag was included for affinity purification and to rapidly determine if the C terminus was cleaved in vivo using immunoblotting. Transformants were identified on medium with glufosinate (BASTA) and PCR-based genotyping, and RT-PCR confirmed expression of the transgenes ( Fig. 10A; Supplemental Fig. S7, B and C). Immunoblotting of stromal proteomes of the confirmed transformants using anti-CGEP-and anti-STREPII-specific antisera showed CGEP in wild type and the two transgenic lines cgep-2/CGEP-STREPII and cgep-2/CGEP-S781R-STREPII, but not in cgep-2 as expected ( Fig. 10B;  Supplemental Fig. S7D). However, no signals were observed with anti-STREPII serum in wild type, cgep-2, and wild-type/CGEP-STREPII, whereas a clear signal was observed in wild-type/CGEP-S781R-STREPII ( Fig. 10B; Supplemental Fig. S7D). This is fully consistent with in vitro autocatalytic C-terminal cleavage by active CGEP, which thus removes the STREPII tag, whereas this tag was not removed by the inactive CGEP-S781R-STREPII peptidase. Protein analysis of Figure 9. Arabidopsis CGEP 3D-structural model generated with iTASSER. A, Cartoon representation of front-and side-view model of mature CGEP (residues 63-961; predicted cTP removed). CGEP with active site residues S718, D792, and H826 (S781, D855, and H889 in full length) displayed in ball-and-stick representation. Colored rainbow from blue (N terminus) to red (C terminus). C-score 5 22.29; estimated TM-score 5 0.45 6 0.14; estimated RMSD 5 14.4 6 3.7Å . Dimensions are indicated. The left and right show the same model, but turned by 90°. B, CGEP model colored as rainbow, overlaid with the x-ray crystal structure of dipeptidyl aminopeptidase IV from S. maltophilia (PDB:2ECF; gray). Color coding as in A. C, Front view of model for the CGEP with N-terminal domain removed, showing residues 378 to 962. Coloring as in A. C-score 5 20.93; estimated TM-score 5 0.60 6 0.14; estimated RMSD 5 9.8 6 4.6 Å . D, Space-filling model of structure in C with active site residues in pink and cleaved C-terminal extension in gray. E, Space-filling model of structure in D with 15-residue C-terminal extension cut, exposing active site residues. leaves from cgep-2/CGEP-C1-STREPII showed that CGEP is not autocatalytically cleaved at E928 or D951, as shown by the lack of detectable STREPII affinity tag by immunoblotting (Fig. 10B). By contrast, autocatalytic cleavage was blocked in cgep-2/CGEP-C2-STRE-PII, as evidenced by the detection of the C-terminal STREPII tags in this line ( Fig. 10B; Supplemental Fig.  S7E). Native gel analysis with the various Cqgz-terminal mutant lines showed that the C-terminal autocatalytic cleavage is not required for dimerization (Supplemental Fig. S7F). Carefully comparative analysis of the growth phenotypes of cgep-2 and the four transgenic lines showed that cgep-2/CGEP-S781R-STREPII and cgep-2/ CGEP-C2-STREPII had significantly reduced growth, whereas cgep-2/CGEP-STREPII and cgep-2/CGEP-C1-STREPII were not significantly different than cgep-2 (Fig. 10C). This showed that overexpression of inactive CGEP or active CGEP, but with an unprocessed C terminus, negatively impacts growth.

The Physiological Significance of CGEP and Its C-Terminal Autocleavage Determined by In Vivo Complementation in the cgep clpr2-1 Background
None of the three cgep null mutant alleles have an obvious growth or developmental phenotype under optimal or under various abiotic stress conditions, as was shown above (Fig. 2B; Supplemental Fig. S2).
However, cgep alleles do create a phenotype in the clpr2-1 background (Fig. 2, E and G), thus providing an opportunity to further test the physiological significance of CGEP variants.

Searching for CGEP Protein Interactors and Possible Trapped Substrates
In an extensive effort to identify CGEP interactors and potential (trapped) substrates, we compared the in vivo protein interactomes of CGEP-STREP, CGEP-S781R-STREPII, and CGEP-C2-STREPII by either coimmunoprecipitation with CGEP antiserum or using the STREPII-tag for affinity purification (Supplemental Fig.  S10; Supplemental Text S1). Whereas excellent recovery for CGEP was obtained in these experiments, no obvious interactors emerged, suggesting only weak interactions between substrates and CGEP, or that substrates are too small to be identified. Supplemental Text S1 summarizes these experiments.

DISCUSSION
S9D proteases are unique to bacteria, including cyanobacteria, and plastids in photosynthetic eukaryotes. The S9 family of a/b-hydrolases (part of Clan SC) has four subfamilies of Ser type peptidases: S9A, with cytosolic prolyl oligopeptidases from bacteria, archaea, and eukaryotes; S9B, with acylaminoacyl peptidases only found in eukaryotes; S9C, with membrane-bound dipeptidyl-peptidase IV in both bacteria and eukaryotes; and S9D, to which chloroplast CGEP was assigned (Rawlings et al., 2014). As also our phylogenetic analysis illustrates, S9D peptidases are found in bacteria and in photosynthetic eukaryotes (where they are most likely confined to plastids), but not in other eukaryotes. Plastid CGEP is clearly of bacterial origin, but the phylogenetic analysis does not support a direct endosymbiotic origin from cyanobacteria. S9 peptidases are generally believed to hydrolyze only relatively short peptide substrates, whereas large structured peptides and proteins are not usually cleaved (Rea and Fülöp, 2006). Crystal structures have been solved (although not for plant proteins) for S9A (PDB:1QFS and PDB:1QFM; Fülöp et al., 1998), S9B (PDB:2ECF; Nakajima et al., 2008), and S9C (PDB:5YZM, 6IGQ, and 6IGP; Yadav et al., 2019) members, but not for S9D members. S9 structures show an N-terminal, eight-bladed b-propeller and a C-terminal peptidase unit folded as an a/b/a sandwich that contains the catalytic triad. These two domains form bowl-like structures that together form a large central cavity, restricting access to the catalytic site (Rea and Fülöp, 2006). The catalytic residues are Ser-Asp-His and conserved motifs around Ser define the A to D families (GGSXGG; GWSYGG, GGSYGG, and GGHSYGA for Fig. 9, A-D, respectively). Two other plant S9 peptidases, both in the S9A family, have been studied experimentally in plants, namely the acyl-amino acid-releasing peptidase1 in Arabidopsis (AARE1, AT4G14570; Nakai et al., 2012) and a tyrosyl aminopeptidase in daikon radish (Raphanus sativus; the likely Arabidopsis homolog is AARE2, AT5G36210  Figure S9. B, Detection of transgenic CGEP protein and the CGEP C-termini by STREPII and CGEP antisera in wild-type and complemented cgep-2 clpr2-1 plants by immunoblotting using soluble leaf extracts. Whereas there are high levels of CGEP protein detected, the STREPII-tagged portion is not detected in cgep-2 clpr2-1/CGEP-STREPII and cgep-2/CGEP-C1-STREPII, demonstrating autocatalytic cleavage of the C-termini in these CGEP variants. By contrast, detection of STREPII in cgep-2/CGEP-C1-STREPII demonstrates that C-terminal cleavage is inhibited. The Ponceau stains show the blot before immunodetection. C, Rosette diameter of clpr2-1, cgep-2, and cgep-2 clpr2-1 complemented with 35S-CGEP-STREPII and its modified forms S781R, C1, and C2. Plants were grown for 26 d on soil under a 16-h light/8-h dark cycle. Averages and SDs are indicated for eight plants per genotype (n 5 8). Asterisks indicate a significant difference (**P , 0.01) between the indicated genotype and cgep-2. (Tsuji et al., 2011). Arabidopsis cytosolic Dipeptidyl peptidase IV-like (AT5G24260) is a member of the S9B family, but it has not been studied. There are, so far, no known S9C members in plants. Whereas CGEP is more closely related to AARE1 and AARE2 proteins than any other Arabidopsis protein, CGEP, AARE1, and AARE2 are only distantly related, with sequence similarity only in the C-terminal peptidase domain. This study shows that CGEP is a stromal oligomeric protein; is physiologically important in stromal proteostasis and starch metabolism; has clear cleavage specificity; has a maximal substrate size limitation, which is influenced by a surprising autocatalytic C-terminal cleavage; and has specific genetic interactions with the stromal CLP protease system, but not with the abundant thylakoid FTSH protease.

CGEP Cleaves C-Terminal of Glu through Endo-and Exo-Peptidase Activity
The PICS analyses presented in this study using rCGEP and CGEP variants demonstrated that CGEP can cleave C-terminal of Glu residues with high fidelity and without any discernable impact of neighboring residues. Furthermore, CGEP efficiently degrades the 10-kD protein bovine insulin and partially degrades 25-kD bovine b-casein, but not larger proteins such as BSA (67 kD). This glutamyl peptidase activity is consistent with observed activity in enriched leaf extracts from spinach, pea, or cucumber leaves using the synthetic peptide carbobenzoxy-L-L-E-naphthylamide (Laing and Christeller, 1997;Yamauchi et al., 2001) or a soluble 7.5-kD recombinant peptide (N terminus of LHCII-1) with multiple Glu residues (Forsberg et al., 2005). This study showed that CGEP has both exo-and endo glutamyl-peptidase activity, i.e. it can cleave C-terminal of Glu residues within any of the three residues of the N-or C-termini (exopeptidase), as well as within substrates further away from the termini (endo-peptidase). Most peptidases in biology have either exo-or endo-peptidase activity; however, there are exceptions such as human metallopeptidase angiotensin I-converting enzyme (MA-E M2 family; Naqvi et al., 2005), vertebrate Cys peptidase cathepsin B (Krupa et al., 2002), and plant cathepsin B-like peptidases (Tsuji et al., 2008;Porodko et al., 2018). Cathepsin B homologs have C-terminal dipeptidase activity (releasing dipeptides from the C terminus of substrates) as well as endopeptidase activity; this dual activity is regulated through an occluding loop. Our analysis showed that CGEP can digest proteins by cleavage of the peptide bond immediately C-terminal of Glu, irrespective of the position of the Glu and any neighboring residues, as long as proteins are relatively small.

Autocatalytic Cleavage of the C-Terminal Extension of CGEP and Its Functional Role
This study discovered and clearly documented C-terminal cleavage of CGEP at E946, both in vitro with rCGEP and in vivo in Arabidopsis leaves from observation of endogenous CGEP, as well as from CGEP variants with a C-terminal STREPII tag expressed from stable transgenes. The cleavage removed the C-terminal 15 amino acids. The evidence was based on MS/MS analysis as well as immunoblotting. Cleavage activity of rCGEP incapable of autocatalytic cleavage through a E946A mutation reduced the substrate size limit, as it was not able to cleave b-casein whereas it still cleaved the smaller insulin effectively. This showed that removal of the C-terminal 15 residues extends the substrate size range of CGEP, likely through providing increased access to the catalytic cavity. Indeed, 3D structural modeling based on the S9B dipeptidyl aminopeptidase IV from S. maltophilia (Nakajima et al., 2008) supports this hypothesis, as the C terminus is positioned at the entry to the catalytic cavity. Importantly, the C-terminal cleavage is also physiologically important, as demonstrated by the in vivo complementation assays in the cgep-2 clpr2-1 background. We explored the literature for other examples of C-terminal autocleavage of peptidases and found an example in the unrelated chymotrypsin-like Cys protease 3CL (3CL pro ) from the Severe Acute Respiratory Syndrome (SARS) coronavirus (Muramatsu et al., 2016). 3CL pro was shown to autocatalytically (by another copy of 3CL pro ) cleave the 10-residue C-terminal prosequence using a noncanonical specificity. A different example is a Leu aminopeptidase from Pseudomonas aeruginosa, which is C-terminally cleaved by other proteases resulting in intramolecular autocatalytic N-terminal cleavage (12 amino acids removed), leading to its activation (Sarnovsky et al., 2009). CGEP is a member of the plant-and bacterial-specific S9D family, and not much is known about this family. However, the better studied S9A, S9B, and S9C peptidase families with different peptidase activities (no glutamyl peptidases) have been shown to use sophisticated and diverse mechanisms to regulate access to the buried active site. This includes transient openings, double-gated entry mechanisms, and active site assembly/disassembly, as reviewed in Rea and Fülöp (2006). The C-terminal cleavage identified here for CGEP is a novel mechanism that is likely a relatively new evolutionary diversification, as it seems absent in algae, primitive plants, and (cyano)bacteria.

The Importance of Glu in Plant Cells and the Role of CGEP
The amino acid Glu is found in abundance in the plant cell (;10% to 40% of the total amino acid pool), likely because of its central role in primary metabolism, including the photorespiratory C2 cycle, tetrapyrrole biosynthesis as a precursor for synthesis of the porphyrin ring, and as substrate for amino-transferases (Forde and Lea, 2007;Hildebrandt et al., 2015). Calculation of amino acid frequencies in the theoretical proteome of Arabidopsis showed that Glu is the third most frequent residue (;7%) in proteins (L and S are more frequent at ;9%; Hildebrandt et al., 2015). The abundance and frequency of Glu in the plant cell could explain why plants evolved a specific glutamyl peptidase with both exo-and endo-peptidase activity. The insensitivity of CGEP activity to the neighboring residues of the target Glu further supports a role of CGEP in Glu homeostasis, because it makes CGEP very effective in liberating Glu. The specificity for Glu must lie in its side chain interacting with the specific residues in the substrate cavity of CGEP. High-resolution structure determination of CGEP together with substrate will be needed to understand the residues contributing to the Glu specificity.

Physiological Role of CGEP and Position within the Chloroplast Peptidase Network
Chloroplast proteostasis involves a network of chaperones and peptidases (Majsec et al., 2017). The in vivo physiological impacts of several chloroplast peptidases have been characterized and loss-of-function mutants have shown a wide range of phenotypes, from novisible phenotypes, cotyledon phenotypes, variegated phenotypes, and virescent-leaf phenotypes (often with the strongest lack of chlorophyll in the youngest leaves) through to embryo-or seedling-lethal phenotypes (for review, see Kmiec et al., 2014;Rigas et al., 2014;Adam, 2015;Xie et al., 2015;Nishimura et al., 2016Nishimura et al., , 2017Kato and Sakamoto, 2018). Furthermore, genetic interactions have been observed within members of the same peptidase system, such as the thylakoid FTSH system (Moldavski et al., 2012;Kato and Sakamoto, 2018), the stromal CLP system , and the thylakoid lumen DEG system (Butenko et al., 2018). However, more recently, genetic interactions have been shown between different chloroplast peptidase systems, such as PREP and OOP (Teixeira et al., 2017), FTSH and CLP (Park and Rodermel, 2004), and VAR2 and EGY1 (Qi et al., 2020). There are various possible explanations for such genetic interactions, including (partial) functional redundancies, protein-protein interactions affecting assembly and stability, or threshold effects in proteostasis. Genetic interactions can also be explained by the concept that degradation of a single protein likely involves a sequence of cleavage events involving the sequential activity of different proteases. For example, large protein substrates will require at least partial, often ATP-dependent, unfolding to initiate degradation and ultimate require exo-peptidases to generate individual amino acids. Relatively little is known about substrate selection mechanisms of chloroplast peptidases and possible hierarchies in protein degradation cascades, in part due to the lack of known tagging systems or degrons. Here, we showed that CGEP genetically interacts with the chloroplast stromal CLP protease system, but not with thylakoid-bound FTSH2. When crossed into various clp mutants (clpr2-1 or clpt1 clpt2 double mutant), the cgep-2 null mutation results in loss of plant growth and biomass, demonstrating the physiological significance of CGEP. Similar complementation experiments within catalytically inactive CGEP or CGEP that lacks C-terminal autocleavage further support the physiological significance of full CGEP activity. The CLP endo-peptidase system has no known upper limit of substrate size due to its ATP-dependent capacity of the chaperone component of the CLP protease system to unfold substrate Rodriguez-Concepcion et al., 2018). Furthermore, once unfolded and delivered into the CLP protease cavity, cleavage occurs without site-specificity, releasing peptides in the range of seven to nine amino acids. By contrast, CGEP has an upper substrate size limit of ;25 kD, and based on synthetic peptides (Laing and Christeller, 1997;Yamauchi et al., 2001) and the PICS experiments shown here, can cleave even short peptides through its exo-peptidase activity. If indeed the CLP and CGEP peptidases contribute to degradation of a shared set of substrates, it seems most likely that CLP acts mostly upstream of CGEP. The comparative proteomics analysis of wild type and cgep-2 showed that loss of CGEP does not result in significant differences in other stromal proteolytic systems, including the CLP system. Moreover, loss of CGEP did result in a significant increase in plastid ribosomal proteins, without affecting the rest of the translational machinery or RNA metabolism. By contrast, loss of CLP protease capacity results in increases in protein initiation, elongation factors, and RNA metabolism, but not in ribosomal proteins . This suggests that CGEP has a unique function within the proteostasis network.
Finally, there were early reports of several endogenous CGEP inhibitors in leaf extracts (;8, 20, and 25 kD) in cucumber leaf extracts, one of which was heatsensitive (Yamauchi et al., 2001). The reducing agent DTT was reported to activate glutamyl peptidase activity in an enriched leaf fraction (Forsberg et al., 2005). Both chloroplast endogenous CGEP inhibitors and redox regulation of CGEP would be a way to control in vivo CGEP activity, but our extensive CGEP protein interaction studies with in vivo samples did not identify candidate proteinaceous inhibitors, nor did we observe any impact of DTT on the peptidase activity of CGEP in vitro. It thus remains to be determined how CGEP activity is regulated in the chloroplast.

A Role for CGEP in Regulation of Starch Metabolism?
The comparative proteomics and coexpression analyses both suggest that the function of CGEP is directly or indirectly associated with starch metabolism. Indeed, Smith and Zeeman (2020) state that no master regulator of the starch biosynthesis pathway has been found and that it seems likely that there are multiple controls at the transcriptional and post-translational levels, depending on the organ and species concerned (Smith and Zeeman, 2020). Interestingly, it has been suggested that in specific species and tissues, starch phosphorylase may be under post-translational control through phosphorylation and protein degradation (Yu et al., 2001;Young et al., 2006;Goren et al., 2018), but our data did not indicate that Arabidopsis starch phosphorylase is cleaved by CGEP. A role of CGEP in regulating starch metabolism could be either direct downregulation of specific starch enzymes through degradation or N-or C-terminal trimming events that change activity directly or indirectly by effecting protein-protein interactions or effecting subchloroplast localization. Finally, as mentioned in the introductory paragraphs, the CGEP homolog in maize is enriched in bundle-sheath chloroplasts as compared to mesophyll chloroplasts Majeran et al., 2010). Due to the distribution of C4 photosynthesis across these two cell types, transient starch is mostly synthesized in the bundle sheath chloroplast. The preferential accumulation of CGEP in the same cell type as starch metabolism further supports a regulatory role of CGEP in starch metabolism.

CONCLUSIONS AND OUTLOOK
Here we characterized a chloroplast CGEP peptidase and showed it has exo-and endo-glutamyl peptidase activity, through which we identified a novel mechanism to increase substrate accessibility to the active site. Complementation experiments and comparative proteomics confirmed the physiological relevance of CGEP. Structural analysis of CGEP and its substrate selection mechanism are now required for understanding the role of CGEP at the molecular level and the position of CGEP in the chloroplast protease network. An in-depth analysis of starch phenotypes is warranted, given the coexpression of CGEP with starch metabolic enzymes.

Phylogenetic Analysis
To generate a phylogenetic cladogram, 41 CGEP proteins from 35 species across the tree-of-life were aligned and trimmed using the tool MUSCLE (http://www.ebi.ac.uk/Tools/msa/muscle/). The cladogram was generated as described in Bhuiyan et al. (2016). Untrimmed alignment is provided in Supplemental Dataset S1.

Generation of CGEP Antiserum
The nucleotide sequence encoding amino acids 640 to 782 of CGEP were amplified by PCR (for primers, see Supplemental Table S4). The resulting DNA fragment was ligated into restriction sites (BamHI and XhoI) of the C-terminal His affinity tag of the pET21a expression vector. BL21 Escherichia coli cells were transformed by a pET21a vector harboring this truncated CGEP gene, and cells were harvested from liquid culture after addition of 1 mM of isopropyl b-d-1thiogalactopyranoside for 3-h incubation at 22°C. The overexpressed proteins were solubilized in 200 mM of NaCl, 50 mM of Tris, and 8 M of Urea at pH 8 and purified on a nickel-nitrotriacetic acid agarose resin matrix. A polyclonal antibody against this truncated CGEP protein was raised in rabbits by injecting purified antigen (Alfa Diagnostic International). Antisera were affinity-purified against the same antigen coupled to a HiTrap N-hydroxysuccinimide esteractivated column (GE Healthcare Life Sciences).

T-DNA Insertion Mutants, Transgenic Plants, Genotyping, and Phenotyping
T-DNA-insertion lines cgep-1 (SAIL_574_D03), cgep-2 (SALK_066117), and cgep-3 (SAIL_589_G08) were obtained from the Arabidopsis Biological Resource Center. T-DNA insertion lines were identified by genotyping and insertion was confirmed by DNA sequencing. The T-DNA insertion lines clpr2-1 and clpt1 clpt2 were described in Rudella et al. (2006) and Kim et al. (2015). Primers for genotyping are listed in Supplemental Table S4. Transcripts levels were determined by RT-PCR using RNA collected from homozygous plants as described in Bhuiyan et al. (2016).
To generate complimented transgenic plants, CGEP (wild-type form; a catalytically inactive form of CGEP mutated at S781R), CGEP-C1-STREPII (CGEP-E928A-D931A), and CGEP-C2-STREPII (CGEP-E946A-E949A-E951A) was generated from complementary DNA and mutations introduced as indicated. Primers used for cloning and mutation introduction are listed in Supplemental Table S4. See also "CGEP Site-Directed Mutagenesis and In Vitro Proteolytic Activity Assays." A nucleotide sequence encoding the StrepII tag was added in the reverse primer before the stop codon to generate a C-terminal tag for protein detection and affinity purification. PCR products were cloned into pCR8 topo vector (Invitrogen) and verified by DNA sequencing. This clone was ligated into a gateway pEARLYGATE100 vector (Arabidopsis Biological Resource Center) by using LR enzymes (Invitrogen). These vectors were transformed into Agrobacterium and subsequently transformed into the cgep-2 null mutant and the double mutant clpr2-1xcgep-2 by the floral-dip method. Transgenic plants were selected using BASTA (for pEARLYGATE100 vector)-containing plates. Plants surviving on selective medium were genotyped, and confirmed transgenics were transferred to soil for seed production and generation of homozygous progeny. Plants were grown in a growth chamber with long-day conditions (18-h light/6-h dark) and temperature at 22°C with 130-or 100-mmol photons m 22 s 21 light intensity. For transcript analysis, total RNA was extracted from Arabidopsis leaves using the RNeasy Plant Mini Kit (Qiagen), and RT-PCR was carried on as mentioned above. To determine the statistical significance in growth phenotypes (rosette diameter and rosette weight), two-sample t tests were used.

Pigment Concentrations
Chlorophyll and carotenoid concentrations were determined by absorbance spectrometry after extraction in 80% acetone (Porra et al., 1989).

Chloroplast and Stroma Isolation
Chloroplasts were isolated on Percoll step gradients from mature rosettes of 4-week-old plants as described in Bhuiyan et al. (2015). Stroma was isolated from total chloroplast by centrifugation at 100,000g for 30 min at 4°C. Protein concentrations were determined using the BCA Protein Assay kit (Thermo Fisher Scientific).

Comparative Proteomics and MS
For protein identification and quantification, each gel lane was cut in consecutive gel slices, followed by in-gel digestion using trypsin and subsequent peptide extraction, as described by Friso et al. (2011). Peptide extracts for each gel band were then analyzed by on-line nano-LC-MS/MS using a Linear Trap Quadropole Orbitrap (Thermo Fisher Scientific). Resulting spectral data were searched against the predicted Arabidopsis proteome (TAIR10), including a small set of typical contaminants and the decoy, as described by Nishimura et al. (2013). Only proteins with three or more matched spectra were considered. Protein abundances were quantified according based on normalized adjusted spectral counts, as explained in Friso et al. (2011). Significance analysis to determine differentially accumulated proteins between wild type and cgep-2 was done as described by Kim et al. (2013). MSderived information, as well as annotation of protein name, location, and function for the identified proteins, can be found in the PPDB (http://ppdb.tc.cornell.edu). The MS data have been deposited to the PRIDE Archive (http://www.ebi.ac.uk/ pride/archive/) via the PRIDE partner repository, and are available as PXD017189 in the ProteomeXchange (http://www.proteomexchange.org/).

Coexpression Analysis
Coexpressed genes for CGEP, as well as five additional organellar protease genes (CLPR1, CLPP5, DEG2 PREP1, and PREP2), were downloaded from the plant coexpression database ATTED-II (http://atted.jp/) using the most recent dataset, Ath-m (Obayashi et al., 2018). The top-100 highest expressed genes, based on MR, for each bait were used for detailed analysis. Protein function was based on the MapMan annotation system (https://mapman.gabipd.org/ mapman) integrated and extensively updated in the Plant Proteome DataBase. The functional enrichment test for the coexpressed genes for each bait was based on the hypergeometric test (Majsec et al., 2017).

Coimmunoprecipitation, StrepII Affinity Purification, and Gel Filtration
For coimmunoprecipitation with CGEP antiserum, the same protocol was followed as described by Bhuiyan et al. (2015), except that stroma was used in this study instead of thylakoids. StrepII-tagged protein purification was carried out as described by Olinares et al. (2011) and Kim et al. (2015), except that Strep-Tactin (IBA Lifesciences) superflow, high-capacity resin was used in this study and biotin was used as an eluent. Gel filtration of stromal proteome was carried out by fast protein liquid chromatography as described by .

Immunoblotting
For immunoblotting, 10 mg of protein (unless otherwise mentioned in the figure legends) was separated by SDS-PAGE, followed by transfer to 0.2-mm nitrocellulose membrane. Proteins were detected by electrochemiluminescence using standard procedures.

CGEP Site-Directed Mutagenesis and In Vitro Proteolytic Activity Assays
Mature Arabidopsis CGEP (starting at amino acid 62; without cTP) was cloned by using forward (AtCGEP-M-FW-BamHI) and reverse (AtCGEP-M-RV-XhoI) primers (primers are listed in Supplemental Table S4). The forward primer contains BamHI and the reverse primer contains XhoI sites. The resulting PCR fragment was ligated into a pCR8 topo vector and confirmed by DNA sequencing. A pCR8 vector harboring the CGEP gene was digested by BamHI and XhoI restriction enzymes. The resulting DNA fragment was ligated into restriction sites (BamHI and XhoI) of a pGEX vector that has an N-terminal GST tag. Three mutants-GST-CGEP-S781R, GST-CGEP-C1 (CGEP-E928A-D931A), and GST-CGEP-C2 (CGEP-E946A-E949A-E951A) of AtCGEP-were constructed by using a PCR method as described by Bhuiyan et al. (2016). For the mutant GST-CGEP-S781R, the C-terminal part of mature protein was amplified from a pCR8 plasmid harboring the CGEP gene by using specific forward (AtCGEPS781R-FW) and reverse primers (AtCGEP-M-RV-XhoI). Forward primer AtCGEPS781R-FW contains the mutation site TCC (Ser) to CGC (Arg). The N-terminal part of the mature protein was amplified by using specific forward (AtCGEP-M-FW-BamHI) and reverse (AtCGEPS781R-RV) primers. Reverse primer AtCGEPS781R-RV contains the introduced site TCC (Ser) to CGC (Arg). The two amplified fragments were gel-purified, mixed, and used as a template (1:1) for second-round PCR to amplify mature protein by using the forward and reverse primer sets AtCGEP-M-FW and AtCGEP-M-RV, respectively. GST-CGEP-C1 was amplified the same way as GST-CGEP-S781R, except for that different primer sets were used to introduce two mutations from GAA (Glu) to GCA (Ala), and GAT (Aps) to GCT (Ala). GST-CGEP-C2 was amplified by using AtCGEP-M-FW-BamHI as a forward primer and AtCGEP-XhoI-E946AE949AE951A-RV as a reverse primer. This reverse primer contains three mutation sites E946A (AGT to ACT), E949A (AGC to ACC), and E951A (AAG to ACG). The PCR fragments were ligated into a pCR8 topo vector and the mutations were confirmed by DNA sequencing. pCR8 vectors harboring different CGEP mutants were digested by BamHI and XhoI sites, and the resulting fragments were ligated into the same sites of the pGEX-5 vector to fuse with GST at the N terminus of the CGEP gene. BL21 E. coli cells were transformed by pGEX vectors harboring various CGEP constructs and cells were harvested from liquid culture after addition of 1 mM of isopropyl b-d-1-thiogalactopyranoside for 3 h incubation at 22°C. Overexpressed wild-type and mutant versions of CGEP in E. coli were solubilized in 500 mM of NaCl, 50 mM of Tris, and 10% (v/v) glycerol, at pH 8, and purified on a glutathione resin matrix. The purified protein was dialyzed by using a dialysis cassette (Slide-A-Lyzer; Thermo Fisher Scientific) against buffer 100 mM of NaCl, 50 mM of Tris, and 10% (v/v) glycerol. After dialysis, the protein was concentrated by using Microcon Centrifugal Filter units (Millipore). In vitro proteolytic activity was performed by incubating recombinant proteins in 100 mM of NaCl, 50 mM of Tris, and 10% (v/v) glycerol with substrate proteins at 37°C. The reaction was stopped by adding 3% SDS (w/v) and then followed by separation of the protein products with SDS-PAGE and staining with Coomassie Brilliant Blue. Additional reactions for rCGEP were carried out by addition of dithiothreitol (DTT; 5 mM) and various concentration of NaCl from 0.1 to 0.5 M, but these additions to the reaction mixture did not affect degradation (data not shown).

PICS for Determination of Protease Cleavage Specificity
The PICS procedure was based on the method described in Schilling et al. (2011) and Biniossek et al. (2016). To generate peptide libraries, 1 mg of proteinsoluble leaf proteome (in 50 mM of HEPES, 40 mg mL 21 of bestatin, and 10 mg mL 21 of phosphoramidon) was mixed with an equal volume of 8 M of GuHCl to denature proteins. DTT was added to a final concentration of 5 mM and the samples were incubated at 65°C for 1 h. After cooling to room temperature, cysteines were alkylated by addition of 15 mM of iodoacetamide and then incubation for 20 min in darkness. Excess iodoacetamide was quenched with 10 mM of DTT and the sample was gradually 8-fold diluted with 200 mM of HEPES at pH 8. Protein extracts were digested with 20 mg of trypsin, 20 mg of GluC, or 15 mg of LysC per 1 mg of protein at 37°C for 16 h in 1 M of GuHCl and 200 mM of HEPES at pH 8. Any precipitate was removed by centrifugation, and an aliquot of the sample (1 mg) was resolved by SDS-PAGE and silver staining to ensure the protein digestion was complete. The Ser protease inhibitor Pefabloc-SC (Sigma-Aldrich) was added to a final concentration of 5 mM to inactivate the digestion proteases trypsin, LysC, or GluC. The peptide libraries were then acidified with formic acid and desalted using 1-mL Resprep C18 columns (Restek). The acetonitrile in the elution buffer was removed with a SpeedVac (Thermo Fisher Scientific) and the peptides suspended in 50 mM of HEPES and 100 mM of NaCl at pH 8, as detailed in "PICS Experiment 2". Alternatively, peptides were dimethylated before desalting and carried forward, as detailed in "PICS Experiment 1".

PICS Experiment 1
After digestion of the proteome (with trypsin, GluC, or LysC), peptides were dimethylated with CD 2 O and then desalted with 1-mL Resprep C18 columns as described above. Purified dimethylated peptide libraries (120-170 mg) were reacted with 6.5 mg of either rCGEP or catalytically inactive rCGEP-S781R and incubated for 16 h at 37°C. CGEP activity was abolished by heating at 70°C for 10 min. To remove small molecules containing primary amines, samples were again desalted with 1-mL Resprep C18 columns as described above and each sample was suspended in 100 mL of 200-mM HEPES at pH 8. Samples/peptides were then reacted with 5 mL of 10-mM Sulfo-N-hydroxysuccinimide-SS-Biotin (Pierce/Thermo Fisher Scientific) and 0.5 mM of final concentration for 2 h at 25°C. One milliliter of Strep-Tactin resin (IBA Lifesciences) was washed 53 with 50 mM of HEPES and 150 mM of NaCl at pH 8 and the resin was split among six tubes. Samples were then added to the resin and incubated 2 h at 25°C with shaking. Each sample was then transferred to a 0.5-mL Pierce Spin Filter (Thermo Fisher Scientific). Resin was washed 103 with 500 mL of wash buffer with a brief spin in a desktop centrifuge to avoid drying of resin. Threehundred microliters of 50 mM of HEPES and 20 mM DTT at pH 8 were added and incubated for 10 min at 25°C. Peptides were eluted by centrifugation into a clean tube followed by an additional 200 mL of the above buffer. Samples were desalted using Resprep C18 columns (Restek) as described above, and were suspended in 30 mL of 2% (v/v) acetonitrile and 2% (v/v) formic acid for LC/MS analysis.

PICS Experiment 2
Fifty microliters of the trypsin and GluC peptide libraries (1 mg mL 21 in 50 mM of HEPES and 100 mM of NaCl at pH 8) were mixed with 5 or 10 mL of rCGEP (1 mg mL 21 in 50 mM of TrisHCl, 100 mM of NaCl at pH 8, and 30% [v/v] glycerol) or rCGEP-S781R, and incubated for 15 h at 37°C. After this incubation, peptides were dimethylated with either light (control: S781R) or heavy (sample: CGEP) formaldehyde. Two mole of CH 2 O (light formaldehyde) or CD 2 O (heavy formaldehyde) was added to give a final concentration of 40 mM, followed immediately by addition of 1 M of NaCNBH 3 to give final concentration of 30 mM. The samples were incubated for 2 h at 25°C and then a second aliquot of CH 2 O and NaCNBH 3 was added, as above, to give 80-and 60-mM final concentrations, respectively, and the samples were incubated overnight at 25°C. The dimethylation reaction was quenched with 0.1 M of Gly, final concentration. The sample (rCGEP digest, heavy label) and control (rCGEP-S781R digest, light label) reactions were then mixed in a fresh tube. A 5-mg aliquot was desalted with a C18 ZipTip (EMD Millipore) using the manufacturer's guidelines and the peptide eluate brought to dryness with a SpeedVac (Thermo Fisher Scientific). Samples were suspended in 20 mL of 2% (v/v) acetonitrile and 2% (v/v) formic acid for LC/MS analysis.
For LC-MS/MS analysis, 6.4 mL of each sample was loaded onto a C18 trapping column and then eluted onto a 15-cm 3 75-mm I.D. A C18 PepMap column was interfaced to an Linear Trap Quadropole Orbitrap (Thermo Fisher Scientific). A 90-min linear gradient from 3% to 40% solvent B was used to separate the peptides. A typical data-dependent acquisition method was used whereby MS spectra were acquired in an Orbitrap (Thermo Fisher Scientific) at 100-K resolution followed by five data-dependent MS/MS scans in the ion trap.
Peak lists (mgf files) for database searching were generated from Thermo XCalibur (Thermo Fisher Scientific) raw data files using DTA Supercharge (http://msquant.sourceforge.net/). The peak lists were searched using the tool MASCOT 2.4 (Matrix Science) against TAIR10, appended with all reverse sequences (Decoy) and common contaminants (71,149 sequences and 29,099,754 residues). After an initial database search performed at 30-ppm MS tolerance and 0.8-D MS/MS tolerance, the peak lists were recalibrated as described in Friso et al. (2010). A semispecific enzyme search was then conducted with semiArgC and semiGluC (V8), allowing for three missed cleavages, 6-ppm MS tolerance, and 0.8-D MS/MS tolerance. For PICS Experiment 1, fixed modifications were carbamidomethylation Cys and dimethyl Lys (heavy, 132 D), variable modifications were oxidized Met, pyroGlu N-term Gln, dimethyl N-term (heavy, 132 D), and Thioacyl N-term. For PICS Experiment 2, fixed modifications were carbamidomethylation Cys and dimethyl Lys, and variable modifications were oxidized Met, pyroGlu 128 D or heavy,132 D). Another search including singly methylated N-term was conducted for select files to detect methylated Nt Pro. The database search results were parsed and sorted in the software Microsoft Excel.
Sequence logo and iceLogo plots were generated with the tool iceLogo v.1.2 (http://www.proteomics.be). The complete TAIR10 proteome was used as a background to normalize for natural amino acid abundance in the library.

Supplemental Data
The following materials are available.
Supplemental Figure S3. Genetic interactions of CGEP with the CLP and FTSH2 chloroplast proteases.
Supplemental Figure S4. Comparative proteomics of wild type and cgep-2.
Supplemental Figure S5. MS/MS analysis of endogenous Arabidopsis CGEP protein accumulation to determine autocatalytic processing of the CGEP C terminus.
Supplemental Figure S6. A sequence alignment with the predicted second structures of S9B dipeptidyl aminopeptidase IV in the Gram-negative bacterium S. maltophilia (PDB:2ECF; Nakajima et al., 2008) and CGEP from the dicotyledons Arabidopsis, B. rapa, and P. trichocarpa.
Supplemental Figure S7. Top views of the Arabidopsis CGEP 3D structural model generated from the iTASSER server with side views shown in Figure 9.
Supplemental Figure S8. Genotyping and molecular characterization of transgenic CGEP complemented lines.
Supplemental Figure S10. Examples of affinity purification of in vivo CGEP and transgenic variants with the objective to identify interacting proteins.
Supplemental Table S1. Distribution of CGEP protein homologs across the species tree-of-life, detailing the 41 proteins for the phylogenetic analysis and the length of their C-terminal extensions.
Supplemental Table S3. Peptides identified in CGEP immunoprecipitated from wild-type Arabidopsis plants and their associated MS information.
Supplemental Table S4. Primers used in this study.
Supplemental Dataset S1. Untrimmed sequence alignment of the 41 CGEP homologs listed in Figure 1A and detailed in Supplemental Table S1.
Supplemental Dataset S2. Comparative proteomics of wild type and cgep-2, with all identified proteins and their annotation, spectral count data, and significance analysis.
Supplemental Dataset S3. Coexpression analysis of CGEP and five other proteases based on the microarray dataset Ath-m.c7-0, and their MR values from the database ATTED-II.
Supplemental Dataset S4. Identified peptides with associated information for PICS Experiment 1.
Supplemental Dataset S5. Identified peptides with associated information for PICS Experiment 2.
Supplemental Text S1. Experiments to determine if CGEP forms stable interactions with other proteins using affinity purification with anti-CGEP serum, or streptavidin resins, using leaf extracts of Arabidopsis transgenic lines expressing either CGEP-STREPII or CGEP-S781R-STREPII.