Expanding the repertoire of secretory peptides controlling root development with comparative genome analysis and functional assays

Highlight Comparative genome analysis combined with transcriptome mining identified secreted peptides possibly involved in root development. When applied on Arabidopsis roots, selected candidates affected growth or branching in specific ways.


Introduction
Plants are complex organisms that consist of distinct cell types organized in tissues. Separate plant organs as well as neighbouring cells exchange a wide range of signals to coordinate development and respond to environmental stimuli. However, the phytohormones that had initially been recognized to control plant growth are relatively few in number. In recent years, peptides secreted into the apoplast by plant cells have also been identified as extracellular signals involved in various biological processes, including development (Grienenberger and Fletcher, 2015;Murphy et al., 2012). These bioactive molecules are referred to hereafter as small secretory peptides (SSPs). Most SSPs are synthesized as preproproteins from which the signal sequence is cleaved upon targeting in the endoplasmic reticulum and further processed by successive proteolytic cleavages through the secretory pathway. Subclasses of cysteine-poor SSPs also undergo additional post-translational modifications, among which proline hydroxylation, hydroxyproline arabinosylation, and tyrosine sulfation have been documented (Matsubayashi, 2014).
Because plants are sessile organisms, they have evolved a remarkable developmental plasticity in order to adapt to a wide range of ecological niches (Guyomarc'h et al., 2010). For example, embryonic roots grow and branch to produce the entire root system through a finely coordinated developmental process that integrates endogenous and environmental cues. Multiple reports have shown that SSPs play an important role in meristem establishment and maintenance, cell division, lateral root (LR) initiation, development, and emergence (recently reviewed in Delay et al., 2013a;Somssich and Simon, 2012).
In Arabidopsis, the LR primordium (LRP) is formed through successive coordinated cell division events, initiated with the first asymmetric division of the pericycle founder cells, and leading to the emergence of the LR (Malamy and Benfey, 1997). The study of promoter-reporter constructs revealed that GOLVEN (GLV) genes are expressed differentially in specific cells and at specific stages during this developmental programme (Fernandez et al., 2013). Overproduction of GLV peptides resulted in a decreased number of LRs and perturbed cell divisions in LRP (Fernandez et al., 2013;Meng et al., 2012). Besides their known role in floral organ abscission, the INFLORESCENCE DEFICIENT IN ABCISSION (IDA) peptide, together with its receptors HAESA (HAE) and HAESA-Like 2 (HSL2), have recently been shown to be involved in LR emergence (Kumpf et al., 2013). Moreover, a role in LR development has been proposed for the C-TERMINALLY ENCODED PEPTIDE 1 (CEP1) in Arabidopsis and Medicago truncatula, as demonstrated by the LR inhibition resulting from CEP1 overexpression or the application of the peptide (Delay et al., 2013b;Imin et al., 2013;Ohyama et al., 2008). Finally, a regulatory module has been identified in which the ERF115 transcription factor, specifically expressed in the root quiescent centre (QC), acts as a rate-limiting factor of cell division and is a direct activator of a phytosulfokine peptide (PSK5) known to control cell division (Heyman et al., 2013).
Previous studies suggest that plant genomes contain more SSP genes than those that have been identified until now and whose function remains to be established (Hanada et al., 2013;Walker, 2006, 2010;Okamoto et al., 2014;Silverstein et al., 2007). Indeed, the annotation of genes coding for SSPs is problematic because they harbour fewer characteristics of protein-coding sequences than larger genes and homology search linking sequence and function is restricted to domains coding for just a few amino acid residues conserved across SSP families. Therefore, bioinformatic pipelines relying simply on sequence homology do not accurately predict SSP genes (Oelkers et al., 2008). Furthermore, hypothetical short open reading frames (ORFs) may arise by chance, albeit without function. Therefore, small ORFs are often under-predicted or systematically removed in genome annotation projects, as was the case in early releases of the Arabidopsis genome. Additionally, the detection of mature SSPs from crude plant tissue extracts is difficult because they are present at very low physiological concentrations (nanomolar range) and are generally masked by degradation products of larger and much more abundant proteins. Hence, it is likely that only a portion of the functional SSPs are known to date.
This study presents a refined method to identify unknown SSPs encoded in plant genomes without prior knowledge of their sequence. On the assumption that SSPs share short conserved oligopeptide stretches, the authors fine-tuned pattern recognition algorithms based on known plant SSP regulators and expanded SSP families to 32 species, including crops. The authors further investigated whether previously uncharacterized SSPs might be involved in root development and showed that some of the corresponding genes were expressed in specific cell types and at particular stages of LR initiation. Finally, the study demonstrated that synthetic peptides matching these SSP conserved motifs strongly alter LR emergence.

Materials and methods
Selection of short proteins with signal peptide As the de novo detection of secretory peptides is sensitive to the quality of the gene models, five sequenced plant genomes with consistently improved annotations were selected: Arabidopsis thaliana (TAIR10), rice (Oryza sativa; IRGSPbuild5 and MSU6.1) (Ouyang et al., 2007;Tanaka et al., 2008); poplar (Populus trichocarpa; JGI v156) (Tuskan et al., 2006); grapevine (Vitis vinifera; Genoscope v1) (Jaillon et al., 2007) and maize (Zea mays; ZmB73_5a) (Schnable et al., 2009). For all five species, genome annotations had been updated at least once after their initial release at the time this analysis was conducted, thus providing quality curated data. Two rice genome annotations were processed because their annotation of small predicted proteins was complementary. Only protein sequences of less than 200 amino acids in length were kept for further analysis. The authors searched for the presence of the signal peptide in the amino-terminal domain by using SignalP v3.0 software (Bendtsen et al., 2004). The signal peptide was predicted with the neural network or hidden Markov model (HMM) profile.

De novo conserved secretory motif detection
The last 50 amino acids from the candidate secretory peptides were searched against each other by using the FASTA program (Pearson, 2000) with the BLOSUM50 scoring matrix to detect mildly related sequences. Second, the all-against-all FASTA search results were subjected to the Markov Cluster Algorithm (MCL version 09-308, inflation value 1.5) (Enright et al., 2002) to identify the sequences into clusters based on the e-value. Special attention was paid to the inflation point in the MCL algorithm because it controls the connectivity between related protein subgroups and the main challenge in the delineation of secretory peptide families is the weak sequence similarity between members. Third, sequences in each cluster were aligned by using the multiple alignment program MUSCLE (Edgar, 2004); non-aligned gaps and non-conserved positions in the multiple alignment were removed based on the BLOSUM62 scoring matrix. Fourth, based on the remaining conserved region, each cluster was represented by a HMM profile with hmmbuild and hmmcalibrate from the HMMER (v2.3.2) package (http://hmmer.wustl.edu/). Fifth, singleton sequences that did not cluster in the previous MCL clustering were searched (hmmersearch) against the HMM profiles to identify the most closely related clusters. When an additional sequence was identified in a cluster, this sequence was combined with the pre-existing ones in that cluster, and the procedure was reinitiated from step 3. We considered the search for a cluster to be completed once no sequence could be added to it.
The HMM profile of each cluster was compared against all HMM profiles by using the Profile Comparer (PRC) (Madera, 2008). Then, the higher-order relationship of the clusters was determined with the MCL algorithm based on the e-values calculated with PRC. To inspect the shared conserved motif of candidate secretory cluster pairs, 'LogoMat-P' (Schuster-Böckler and Bateman, 2005) was applied to generate the pairwise HMM logos. A group of clusters linked by the PRC program was considered to be one putative secretory family (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/ browse.php).

Analysis of SSP sequences across plant genomes
The genome annotations of 32 photosynthetic organisms were downloaded from Phytozome or genome-specific databases (Supplementary Table 1, available at JXB online). These included updated versions of the reference species genomes selected for the initial clustering, most importantly a unified genome for rice (Kawahara et al., 2013) and an updated genome assembly and annotation for poplar. Protein sequences were filtered with the same criteria as applied to the reference species genomes: protein sequence shorter than 200 amino acids with signal peptide in the N-terminal detected by SignalP. In total, 75,970 proteins were suitable for screening for the SSP signature, among which 35,875 contained SSP motifs as defined in the library created with the reference species (hmmpfam e-value 0.05) (http://bioinformatics.psb.ugent.be/ webtools/PlantSSP/).

Microarray data normalization and compendium analysis
Transcriptome datasets were retrieved as Gene Expression Omnibus accessions: GDS1515 (Vanneste et al., 2005), GSE42896 (De Rybel et al., 2012), GSE6349 (De Smet et al., 2008), and GSE8934 (Brady et al., 2007) for the phloem and the xylem pole pericycle expression files. The full pericycle expression data, based on the J2661 Arabidopsis marker line, were a kind gift (Levesque et al., 2006). Array data were normalized with the robust multiarray average algorithm (Irizarry et al., 2003) and the absolute values, fold change (FC), and pairwise P-values were determined with the affylmGUI R package (Smyth, 2004) without adjustment. Twofactor analysis of variance (ANOVA) P-values were computed with the MultiExperiment Viewer (http://www.tm4.org/mev.html). Affymetrix probe sets were assigned to AGI gene ID according to the affy_ATH1_array_elements-2010-12-20.txt file from TAIR (www.Arabidopsis.org). Ambiguously assigned genes (multiple gene identifiers for one probe set) and microarray controls were discarded. Genes were considered significantly regulated in specific experiments when the following criteria were fulfilled: absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (0-2, 2-6, 0-6 h) upon LR induction in the control plants, and a two-factor ANOVA P ≤ 0.01 for the interaction between treatment and genotype (Vanneste et al., 2005); absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (0-2, 2-6, 0-6 h) for both compounds [1-naphthaleneacetic acid (NAA) and naxillin] during the time course upon the LR induction system (De Rybel et al., 2012); absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (0-2, 2-6, 0-6 h) during the time course upon LR initiation in the sorted pericycle cells (De Smet et al., 2008); absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (xylem pole pericycle vs. phloem pole pericycle, xylem pole pericycle vs. full pericycle, full pericycle vs. phloem pole pericycle) and similar positive or negative sign for all the pairwise comparisons (Parizot et al., 2012). Additionally, a radial layer specificity was determined as described by Brady et al. (2007) and a gene was tagged when specifically expressed in the xylem or phloem pericycle pole, or in the primordium. Furthermore, an oscillation cluster association was determined as described by Moreno-Risueno et al. (2010) and a gene was tagged when expressed in phase or antiphase with DR5 oscillation.

Plant material and growth conditions
All experiments were conducted with wild-type Arabidopsis thaliana (L.) Heyhn, accession Columbia-0 (Col-0). Seeds were surface sterilized and sown on half-strength Murashige and Skoog medium (Duchefa Biochemie B.V.) complemented with 1% (w/v) agarose and 1.5% (w/v) sucrose at pH 5.8. Seeds were stratified for at least 2 days at 4 °C. Seedlings were germinated in illuminated growth chambers under a 16 h light/8 h dark cycle (100 µmol m -2 s -1 ) at 21 °C. N-1naphthylphthalamic acid (NPA) and NAA treatments and transcript level assays were as described by Himanen et al. (2002).

Gene expression analysis
Total RNA from roots 5 days after germination was isolated with TRIzol reagent (Invitrogen), followed by treatment with RNase-free DNase I (Qiagen) according to the manufacturer's instructions. The cDNA was prepared with the iScript™cDNA Synthesis Kit (Bio-Rad) from 1 μg of total RNA and 1:10 dilutions of total cDNA were used as template for quantitative RT-PCR. Genes and primers are listed in Supplementary Table 6. Means of samples were compared with two-way ANOVA (GraphPad Prism; V6.00, GraphPad Software).

Statistical tests
Means of samples were compared with Student's t test; equality between the population variances was assessed with the F test. Data were pooled from independent biological replicates unless specified otherwise.

Identification of SSP genes in reference plant genomes
The authors searched for domains conserved across multiple plant species to identify potentially bioactive SSPs. Because the accuracy of gene models is crucial in this context, only species for which reliable genome annotations were available at the time this analysis was conducted were included: Arabidopsis, rice (Oryza sativa), poplar (Populus trichocarpa), grapevine (Vitis vinifera), and maize (Zea mays) (see Materials and Methods for details).
To benchmark SSP identification algorithms, the preproprotein primary sequences of signalling peptides known or suspected to be involved in root development (identified first in Arabidopsis in most cases) were collected. These include: CEP, CLAVATA3 (CLV3/CLE), GOLVEN/ ROOT GROWTH FACTOR/CLE-LIKE (GLV/RGF/ CLEL), IDA, PSK, PLANT PEPTIDE CONTAINING SULFATED TYROSINE (PSY), and additional cysteinerich peptides (Table 1; Supplementary Table 3). In total, 195 Arabidopsis protein sequences were collected from these known secretory peptide families. Most of these short preproproteins contain an amino (N)-terminal signal peptide and a conserved carboxyl (C)-terminal end that is cleaved off to yield the mature signal. This latter sequence corresponds to the secreted bioactive portion of the peptide hormones shown in multiple cases to act as a ligand of leucine-rich repeat-receptor-like kinase (LRR-RLK) membrane proteins (Benková and Hejátko, 2009;Butenko et al., 2009;Murphy et al., 2012). The successive stages of the analytical pipeline aimed at identifying SSPs are explained below and summarized in Fig. 1.

Length:
The average protein sequence length in the SSP benchmark set was 102 amino acids (Supplementary Table 3). The threshold of 200 amino acids was chosen as a conservative cut-off to exclude long protein sequences, resulting in 158,135 proteins selected from the predicted proteomes of the selected species (including splice variants). Approximately 24% of the predicted Arabidopsis proteins were shorter than 200 amino acids, yet the arbitrary protein sequence length cut-off removed only five out of 216 secretory peptides (2.3%) from the benchmark dataset [CEP (At1G31670), At3G50610, gibberellic acid-stimulated in Arabidopsis (GASA; At5G14920), putative precursor for endogenous peptide elicitor (PROPEP; At1G17750), and At1G73080].
Secretion: Of these short proteins, 39,917 were predicted to contain an N-terminal hydrophobic region recognized as a cleavable signal sequence. However, not all characterized secretory signalling peptides carry such an identifiable sequence. Among the benchmark proteins, 40 (18.5%) did not contain a conventional signal peptide sequence, which may partly be explained by the arbitrary choice for the SignalP peptide identification parameters (Emanuelsson et al., 2007).
Conserved C-terminal motif: To reduce noise in sequence comparison, only the last 50 amino acids of the proteins were considered in the all-against-all FASTA sequence similarity search (e-value cut-off 10 -3 ) (Pearson, 2000). The first round of aggregation with the MCL grouped 23,442 proteins into 4,787 clusters and left out 16,475 proteins as singletons.

SSP family assembly
The candidate secretory peptides were further classified according to sequence homology by combining graphic clustering algorithms and pairwise profile comparisons (see Materials and Methods for details). To evaluate the performance of the clustering parameters, the assembly of the known Arabidopsis CLV3/CLE and GLV/RGF/CLEL secretory signalling peptides was examined. After the initial MCL analysis, yielding 4,787 independent clusters, the 32 CLE Arabidopsis proteins were still scattered in seven clusters ( Supplementary Fig. 1) and the 11 Arabidopsis GLV proteins (including one splice variant) in five clusters ( Supplementary  Fig. 2). The relationship between clusters was then calculated via pairwise profile comparisons and their higher-order relationship was determined with the MCL algorithm to aggregate related clusters into larger families whenever possible. The resulting clusters and aggregated families are numbered c# and f# as listed in Supplementary Table 2. The corresponding consensus and sequences can be searched online (http:// bioinformatics.psb.ugent.be/webtools/PlantSSP/).
The MCL clustering based on the protein profiles markedly improved the resolution of known secretory families. For example, the Arabidopsis GLV peptides were all grouped in a single family ( Fig. 2A; Supplementary Fig. 2; Table 1;  Supplementary Table 2). As expected, the topology of the cluster connectivity network built with the predicted proteins selected from the five reference species resembles the phylogenetic relationships between peptides in the family, as close sequences according to the phylogenetic tree tend to group together in the same cluster or in neighbouring clusters (Fig. 2B).
The assembly of the large CLE peptide family further illustrates the usefulness of the sequence clustering method used. A classical multiple sequence alignment of the CLE peptides identified conserved amino acid positions ( Supplementary  Fig. 3). In comparison, in the analytical pipeline, the TribeMCL clustering based on the FASTA search data (which removes non-aligned gaps and non-conserved positions) first grouped CLE peptides with the most similar bioactive domains, resulting in seven clusters (Supplementary Fig. 1). Next, a HMM was built to represent each cluster separately and the second round of TribeMCL clustering resolved the cluster relationship into two families ( Supplementary Fig. 1,  inset), which coincidentally correspond to the subgroups involved in either root apical meristem (RAM) maintenance or vascular development (Kiyohara and Sawa, 2012).
In summary, the multispecies genome-scale analytical pipeline can reconstruct known secretory peptide families and distinguish subfunctional classes without prior knowledge of specific sequences, but simply taking into consideration the preproprotein length, the presence of a N-terminal signal sequence and the conservation of C-terminal oligopeptides.
In addition, the manual curation of previously unreported consensus sequences revealed conspicuous patterns commonly observed in known signalling peptide families. For example, a tyrosine residue was found in the conserved motif in multiple families (e.g., f131, f409, f919; Fig. 3). Such a tyrosine residue is known to be sulfated in the GLV, PSK, and PSY mature signalling peptides, where it is also preceded by an aspartic acid residue. Its presence and its post-translational modification are crucial for bioactivity (Komori et al., 2009;Matsuzaki et al., 2010;Whitford et al., 2012). The conserved motifs often end at or very near the last C-terminal residue of the precursor protein and contain one or several proline residues that might act as hinges when the peptide ligand binds to its receptor (Fig. 3). Together, these observations indicate that the global de novo sequence search method used in this study provides valuable hints about unrecognized bona fide SSPs.

Secretory peptide evolution in plants
On the basis of the SSP library created with the five reference species, the SSP family content was extended to 32 publicly available genomes of photosynthetic organisms (Supplementary Table 1) filtered with the same method as for the initial clustering. The resulting secretory peptide family library is a useful resource to search for known, as well as uncharacterized, SSPs encoded in plant genomes (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/).
Despite the challenge of short ORF prediction and the unequal quality of genome annotations, a clear trend of SSP expansion can be observed: known SSPs are encoded in large families in land plants but are almost completely absent in Chlorophyta (Fig. 4). This phylogenetic pattern may reflect that unknown sets of intercellular signals, among which secretory peptides, were required for the development of complex architectures characterizing the land plant lineage.

SSP gene regulation in the course of Arabidopsis root development
Considering the established role of several secretory peptides in root development, the authors examined how SSP genes were expressed during LR formation in Arabidopsis. The aim was to test whether the spatiotemporal specificity of their transcription pattern could be a valuable predictor for their possible involvement in root development. To this end, SSP transcript levels were analysed in transcriptome experiments addressing early aspects of LR initiation, which takes place in the pericycle associated with the xylem poles and depends on a SOLITARY ROOT/INDOLE-3-ACETIC ACID14 (SLR/IAA14)-mediated auxin signalling cascade. Three datasets follow the transcriptional regulation occurring during the induction of LR initiation upon treatment: (i) with auxin and depending on SLR/IAA14 (Vanneste et al., 2005); (ii) with auxin and naxillin, a non-auxin-like LR-inducing molecule (De Rybel et al., 2012); and (iii) with auxin, specifically changes in the pericycle cells at the xylem pole (De Smet et al., 2008). Two other datasets address the spatial expression pattern of genes: (iv) the differential between the pericycle cells at the xylem or phloem pole (Parizot et al., 2012); and (v) specificity in the LRP, either in the entire pericycle or in one of its subpopulations (xylem or phloem pole) (Brady et al., 2007). The last dataset (vi) focuses on the temporal expression pattern in phase or antiphase with the auxin transcriptional response marker DR5 in the basal meristem (Moreno-Risueno et al., 2010).
First, the transcriptomics data were searched for patterns associated with known SSP gene families (Table 1). Although a portion of the SSP sequences are not represented on the  Affymetrix ATH1 microarray (65 out of 148; 44%), half of the 83 known SSP genes with a corresponding probeset had a specific spatiotemporal expression pattern in a least one of the analysed experiments (FC ≥ 1.5, P ≤ 0.01; for additional information, see Materials and Methods; see also Supplementary  Table 4). This observation suggests that many more secretory peptides might be involved in apoplastic signalling during LR initiation than previously recognized. This analysis was extended to genes belonging to uncharacterized SSP families, coding for motifs reminiscent of known signalling peptides (Fig. 3), and represented on the ATH1 microarray. Five genes in three families showed significant changes in at least one of the analysed experiments according to the same criteria as above (Table 2). At4G37295, At4G34600, and At4G37290 are induced in the xylem pole pericycle upon auxin treatment and depend on the IAA14/ SLR pathway. At4G37295 and At4G37290 are also induced upon naxillin treatment. At4G37295 is specifically expressed in the LRP. At4G28460 and At1G49800 are in phase with the oscillating auxin response observed in the basal meristem with the DR5 marker, and the expression of At4G28460 is also higher in the phloem pole pericycle than that in the xylem pole pericycle. In conclusion, the expression of a large fraction of SSP-encoding genes is regulated during LR initiation, whether they have been recognized previously as involved in development or not.

SSP functional analysis
The activity of SSPs can be tested by the application of chemically synthesized peptides on plant tissues because the response they induce often mimics the cognate genetic gain-of-function phenotypes, as shown in Arabidopsis roots (Fernandez et al., 2013;Fiers et al., 2006;Matsuzaki et al., 2010;Whitford et al., 2012). Such experiments demonstrated that the bioactive portion of the SSP preproproteins is encoded in their C-terminal conserved sequences.
To investigate the potential role of uncharacterized SSPs, seedlings were grown on agar medium supplemented with synthetic peptides corresponding to conserved C-terminal stretches ( Fig. 5; Supplementary Table 5). Whereas synthetic SSPs, including members of the CLV3/CLE and GLV/RGF/CLEL families, are active at nanomolar concentrations (Murphy et al., 2012), the absence of certain post-translational modifications in synthetic copies has been shown to reduce bioactivity compared with native peptides (Matsubayashi, 2014;Seitz, 2000;Shinohara and Matsubayashi, 2013). To avoid false-negative results due to lack of post-translational modification, micromolar concentrations of synthetic peptides were applied, as is commonly reported in such experiments.
The number of LRs and the primary root length were compared between control seedlings and seedlings treated with 1 µM or 10 µM of peptides for three uncharacterized families. Peptides (Pep) from families f31 and f919 decreased the number of emerged LRs. Pep f919-2 (At4G34600), in particular, resulted in a 70% decrease compared with control untreated seedlings ( Fig. 6A; Supplementary Fig. 4). In all cases, the effect was stronger or only detectable at 10 µM. Furthermore, plantlets treated with 10 µM of Pep f31-2 (At4G37295) were pale and arrested in growth. From the family f1528, only Pep f1528-2-2 (At2G23270) and Pep f1528-3-2 (At4G37290) induced significant differences compared with control untreated plants ( Fig. 6A; Supplementary Fig. 4). Peptides inhibiting LR emergence had no detectable effect on primary root growth, except Pep f31-1 and Pep f919-2 and, at high concentration, Pep f919-1 and Pep f1528-2-1 ( Fig. 6B; Supplementary Fig. 4). As expected, treatment with randomized Pep f31-2 and Pep f919-2 showed no effect on either root growth or LR emergence.
In a recent independent study, Hou et al. (2014) showed that genes coding for peptides secreted in the apoplast are induced by pathogen-associated molecular patterns (PAMPs) and amplify immunity. The so-called PAMP-induced peptides PIP1 and PIP2 correspond to Pep f31-3 and Pep f1528-2, respectively, and share a SGPS motif in their C-terminal conserved region. The same report showed that the overexpression of prePIP1 and prePIP2 and the application of PIP1 and PIP2 synthetic peptides inhibited root growth, in Fig. 3. Conserved SSP C-terminal sequences. Consensus sequences are represented for previously uncharacterized families. Conserved protein residues are higher in the HMM profile (see Supplementary Table 2  agreement with the present results. The PIP family was further extended to include PIP-LIKE (PIPL) peptides, related to IDA/IDL and CEP peptides, and possibly involved in the response to biotic and abiotic stresses (Vie et al., 2015).
To confirm the plausible role of the corresponding SPP genes in LR development, the authors quantified their transcriptional changes in the LR-inducible system (Himanen et al., 2002). In this experimental set-up, the first formative divisions are prevented by the auxin transport inhibitor NPA. Later, upon auxin (NAA) treatment, cells in the pericycle layer engage actively and synchronously in division. Quantitative reverse-transcription PCR (qRT-PCR) Fig. 4. SSP evolution in plants. For each genome, the number of proteins in a given secretory peptide family is represented as shown in the bottom bar: species with no SSP are encoded in grey, those with one SSP in white, and those with higher number of SSPs in increasingly deep red. The graph was generated with the MeV software package (Saeed et al., 2003)  analysis showed very specific transcription patterns for some candidates (Fig. 6C). Expression of the genes analysed increased after both 2 h and 6 h for AT4G37295, AT5G43066, AT4G37290, and AT2G16385, but continuously decreased for AT4G28460 and AT4G34600. The expression level of AT3G06090 and AT2G23270 decreased after 2 h and increased after 6 h, while AT1G49800 had the opposite pattern of expression. These changes are in accordance with the transcriptome data and further indicate that the tested genes are involved in root development, including LR initiation (Fernandez et al., 2013;Ohyama et al., 2008).
Finally, the authors investigated whether the phenotype caused by newly discovered bioactive peptides may be an indication of their plausible function. Cleared roots were analysed after treatment with Pep f919-2, which is the strongest inhibitor of root branching in this study (Fig. 6), and compared with untreated roots or roots treated with a randomized Pep f919-2 (Fig. 7). This experiment confirmed that Pep f919-2 significantly decreased the number of emerged LRs. However, the peptide treatment did not affect the number of primordia being initiated (Fig. 7A). Instead, Pep f919-2-treated roots carried an unusually high number of primordia at stage V of development, which normally precedes the progression of the LR through the overlying cell layers (endodermis, cortex and epidermis) before it emerges from the body of the main root (Malamy and Benfey, 1997). Furthermore, the shape of the primordia was clearly different depending on the root treatment. Most primordia grew with a classical dome shape in the control plants (Fig. 7B). In contrast, in Pep f919-2-treated roots, the vast majority of LRPs appeared flattened as if pressed against the overlying tissues (Fig. 7C, D).
The reduced LR density and flattened primordium phenotypes are very similar to those of the ida and hae hsl2 mutants (Kumpf et al., 2013). In wild-type roots, LR emergence is promoted by auxin fluxes redirected in the LRP and surrounding tissues that eventually lead to the induction of auxin-and IDA-responsive genes. These genes code for cell wall-remodelling enzymes that trigger cell separation as they open the way to the protruding primordium (reviewed in Atkinson et al., 2014). In ida and hae hsl2 as well as in other auxin transporter mutants, overlying tissues fail to soften and LRP development stalls as emergence is blocked.
These observations suggest that AT4G34600 takes part in the events preparing for the penetration of the LR through the outer layers of the root: its expression normally decreases during LR formation, and the exogenous application of the f919-2 secreted peptide it encodes resulted in compression of the LRP and the inhibition of LR emergence. While the molecular function of AT4G34600 remains to be elucidated, the data collected so far provide a good framework for future studies.

Discussion
A bottleneck in the functional study of signalling peptides in plant growth and development has been the identification of the encoding genes. Whereas the sequencing of different plant genomes has led to the prediction of numerous small genes, some of which potentially encode signalling peptides, the identification of conserved families via comparative genomics is difficult, because their bioactive domains are restricted to just a few amino acids.
Unlike previous studies solely relying on the SSP information embedded in the Arabidopsis genome annotation (Lease and Walker, 2006;Silverstein et al., 2007), the de novo comparative genomics approach used in this study takes advantage of additional available plant genomes without a prior knowledge of the SSP sequence information, resulting in the fine resolution of the SSP families. The presence of multiple plant species in the analytical pipeline increases the sensitivity to separate large SSP families into multiple smaller groups. The subsequent profile comparison improved the  (n = 19-44). Seedlings (10 days after germination) were compared with controls after treatment with the indicated peptides. Error bars represent the 95% confidence interval. Asterisks mark significant differences: * P < 0.05; ** P < 0.005, *** P < 0.001. Data were pooled from independent biological replicates. (C) Induction of SSP gene transcription by auxin. Seedlings were treated with 1 µM NAA for the indicated time points. Fold changes were measured after qRT-PCR analysis of root tissues. Data are shown for one of two independent experiments. np, no peptide. clustering specificity. The authors' bioinformatic approach produced a classification that can be updated rapidly and regularly as genome annotation information accrues. The searchable public website presenting the SSP classes and the corresponding consensus sequences across multiple plant species is a valuable resource to explore understudied peptide regulators or to identify homologues in crops (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/). Finally, the consensus motifs that were found can serve as functional domain hallmarks to search for small missed genes, either in assembled genome sequences or in shorter RNA-sequence reads.
The meta-analysis of transcriptome data linked to LR development  has already led to the discovery of several genes proven to be involved in LR development in follow-up genetic studies (GATA23, De Rybel et al., 2010;E2Fa, Berckmans et al., 2011;PdBG1, Benitez-Alfonso et al., 2013;totipotency genes, Chupeau et al., 2013;PLT3, Zhang et al., 2013;PDCB1, Maule et al., 2013). To point out the potential involvement of unidentified candidate SSP families in the process of LR development, the authors of the present study identified genes with specific expression patterns during LR initiation and showed that the majority of encoded conserved peptides tested altered the growth of Arabidopsis roots when applied exogenously, some in very specific ways. Peptide assays are cheap, easy, and rapid first steps toward the classification of non-cell-autonomous factors potentially involved in development. They can be adapted to a wide range of processes.
Of course, the refined understanding of the SSP function requires additional studies to avoid the pitfalls of gain-of-function phenotypes: non-physiological concentrations of signal molecules may create artefacts, for example, by hijacking downstream pathways of related, but distinct, peptide signal(s); in addition, exogenous applications are not directional, whereas SSP genes are often expressed in very specific cell types, as again demonstrated here. Nevertheless, these results indicate that the successive combination of SSP gene annotation, expression studies, and in vivo peptide assays is a useful approach to start rapidly probing the complexity of the extracellular signalling networks that drive plant tissue growth and development.

Supplementary data
Supplementary data are available at JXB online. Supplementary Fig. 1. CLE peptide bioactive domain defined by multiple sequence alignments and HMM logos. Supplementary Fig. 2. GLV peptide bioactive domain defined by multiple sequence alignments. Supplementary Fig. 3. Multiple sequence alignment of the C-terminal 50 amino acids of the Arabidopsis CLE family. Supplementary Fig. 4. Root-related phenotypes are not induced by randomized peptide sequences.
Supplementary Table 1. Genomes of photosynthetic organisms included in the SSP family definition.
Supplementary Table 2. SSP clusters and families constructed with the Markov Cluster Algorithm and Profile Comparer and based on the five reference species.
Supplementary Table 3. SSP genes collected as a benchmark set for de novo secretory peptide detection algorithms.
Supplementary Table 4. Specific expression patterns of known SSP genes during LR formation.
Supplementary Table 5. Synthetic peptide sequences tested for effect on root growth and development.
Supplementary Table 6. Primers used for qRT-PCR analysis.