Revised nomenclature and functional overview of the ULP gene family of plant deSUMOylating proteases

Functional insight on the post-translational modifier SUMO and its biochemical pathway in plants has steadily increased over the past decade. In contrast to the low number of core components that catalytically control SUMO attachment to targets, the enzymes that control deconjugation and SUMO maturation seem to have diversified in terms of both gene number and biological function. However, studies on these deSUMOylating proteases have been accompanied by diversity in nomenclature and unclear evolutionary categorization. We provide a state-of-the-art assessment of the evolutionary subclades within the ULP gene family of plant deSUMOylating proteases, and propose a nomenclature for this protease subgroup for consistent annotation of ULP-encoding genes in plant genomes.

Functional insight on the post-translational modifier SUMO and its biochemical pathway in plants has steadily increased over the past decade. In contrast to the low number of core components that catalytically control SUMO attachment to targets, the enzymes that control deconjugation and SUMO maturation seem to have diversified in terms of both gene number and biological function. However, studies on these deSUMOylating proteases have been accompanied by diversity in nomenclature and unclear evolutionary categorization. We provide a state-of-the-art assessment of the evolutionary subclades within the ULP gene family of plant deSU-MOylating proteases, and propose a nomenclature for this protease subgroup for consistent annotation of ULPencoding genes in plant genomes.
The Small Ubiquitin-like Modifier (SUMO) polypeptide is a member of the Ub-fold family, which is collectively defined by a signature β-grasp fold. Like ubiquitin (Ub), SUMO acts in the post-translational modification of proteins, and is important for plant development and adaptive responses to the environment (Castro et al., 2012;Yates et al., 2016). The SUMO conjugation and deconjugation cycles have to be tightly regulated, and numerous SUMO proteases are fundamental for this equilibrium. Several types of deSUMOylating proteases (DSPs) were uncovered in non-plant models, namely ULP/SENPs, DESIs and USPLs, which belong to separate families of cysteine proteases (C48, C97 and C98, respectively) (Hickey et al., 2012;Nayak and Muller, 2014). Presently, the only functionally characterized plant DSPs belong to the Ub-Like Protease (ULP) gene family.

Evolution and nomenclature in plant ULPs
ULPs are cysteine proteases belonging to the C48 family (MEROPS release 12.0; Rawlings et al., 2018). Despite sharing similarities with the catalytic domains of some classes of deubiquitylating proteases, such as Ubiquitin Specific Proteases (UBPs) and Ubiquitin C-terminal Hydrolases (UCHs), they belong to different clans (clan CE for ULPs, and clan CA for UBPs and UCHs). CE and CA proteases share a papain-like fold and, most likely, a common origin (van der Hoorn, 2008;Rawlings et al., 2018). Historically, ULPs have been divided into two large groups (ULP1s and ULP2s), following the identification of two functionally separate paralogs -ScULP1 and ScULP2/Smt4 in yeast Hochstrasser, 1999, 2000). Later, human ULPs were also differentiated into ULP1s (SENP1, -2, -3 and -5), and ULP2s (SENP6 and -7) (Mukhopadhyay and Dasso, 2007). Plant deSUMOylating proteases belonging to the ULP gene family have mostly been studied in the model plant Arabidopsis. Despite the significant functional advances, difficulties have arisen in establishing definitive gene abundance, phylogeny and nomenclature of this gene family.

Gene abundance
The Arabidopsis genome is assumed to contain eight ULPs (Box 1) (Novatchkova et al., 2012;Castro et al., 2016;Benlloch and Lois, 2018;Garrido et al., 2018). Often, however, only seven have been described because of the failure to incorporate At3g48480 (Novatchkova et al., 2004;Colby et al., 2006;Hoen et al., 2006), as this is a highly truncated form albeit one that retains the protease domain. Also, initial phylogenetic studies incorporated At5g60190 (Novatchkova et al., 2004;Hoen et al., 2006), which was subsequently identified Box 1. Plant ULP evolution and nomenclature (A) A schematic tree, depicting currently accepted phylogenetic relationships between organisms, summarizes the evolutionary path of the plant ULP gene family of deSUMOylating proteases. Plant ULPs have a polyphyletic origin than can be traced to green algae and ultimately to examples in other eukaryotes, including ScULP1 and ScULP2. ULP1s form a homogenous class (Class I, ELS-type), while ULP2s branch out into Class II (OTS-type) and Class III (SPF-type) proteases during early plant evolution. Class IV (FUG-type) consistently appears in flowering plant genomes and seems absent from early plant taxa, but its origin remains elusive (Castro et al., 2018). (B) Existing nomenclature for all Arabidopsis ULPs. We propose a nomenclature that reflects biological function and assumed phylogenetic relationships. It incorporates new gene names for two Arabidopsis ULPs (highlighted in blue). In future annotation of plant genomes, plant ULPs may be spelled with a prefix of the species, followed by increasing numbering. For example, tomato Class II ULPs may be named SlOTS1, SlOTS2, and so on. References in main text; see also Miura et al., 2007. as a deNEDDylating rather than a deSUMOylating protease, and named Deneddylase 1 (DEN1; Box 1) (Colby et al., 2006;Mergner et al., 2015). Initial reports similarly established a massive gene expansion in this gene family (Kurepa et al., 2003;Hoen et al., 2006;Lois, 2010). This has been traced to the presence of at least 97 MULE transposons that contain intact peptidase C48 domains, and are likely to have expanded via ancient transduplication events (Hoen et al., 2006). Though these amplified genomic loci may encode polypeptides that possess SUMO protease activity, they are phylogenetically more distant than the deNEDDylating protease DEN1 when compared to ULPs, and display low or undetectable expression, which suggests they are unlikely to act towards SUMO (Hoen et al., 2006). Hoen and co-workers (2006) have named these Kaonashi (KI) elements, and here we propose a definitive nomenclature as Kaonashi ULP Like Proteases (KIUs) (Box 1).

Gene phylogeny
The eight canonical Arabidopsis ULPs have consistently been categorized in light of their strong amino acid sequence conservation to yeast ULP1 or ULP2 (Kurepa et al., 2003;Novatchkova et al., 2004;Mukhopadhyay and Dasso, 2007;Lois, 2010), though they can be resolved into additional phylogenetic subgroups (Colby et al., 2006;Novatchkova et al., 2012) (Box 1). Insight based on more extensive comparative genomics data suggests that At4g15880/ At3g06910/At4g00690 form a homogenous class of ULP1s (homologous to yeast ScULP1). In contrast, Arabidopsis homologs of ScULP2 can be divided into three classes, containing At4g33620/At1g09730, At1g10570/At1g60220 and At3g48480 (Novatchkova et al., 2012;Castro et al., 2018). Existence of four classes is also supported by protein topological data, namely protein size and the location of the ULP domain (Benlloch and Lois, 2018;Castro et al., 2018).
Here, we propose a definitive classification for the four plant ULP classes (Classes I-IV) based on the Arabidopsis ULPs (Box 1).

Gene nomenclature
The community has been struggling to define a coherent naming of Arabidopsis ULPs. Initially they were named after assumed phylogenetic relatedness to ULP1 or ULP2 proteins. Erroneously, this led to the naming of At1g10570, At1g60220 and At3g48480 as ULP1c, ULP1d and ULP1e, respectively (Kurepa et al., 2003;Lois, 2010;Castro et al., 2016), even though they are phylogenetically related to ULP2s. Functional studies in Arabidopsis generated an increasing number of names that disregarded molecular function in favor of biological function, resulting in several parallel nomenclatures. Most ULP genes have between two and as many as four names for a single member. It is important to clarify this matter to create a consensual nomenclature based on biological function, while at the same time respecting known phylogenetic data. The proposed nomenclature is detailed in Box 1.

ULP function
It is well established in non-plant models that ULPs are regulated at various levels, including enzymatic activity, SUMO isoform discrimination, subcellular localization and expression patterns (Hickey et al., 2012;Nayak and Muller, 2014;Kunz et al., 2018). A series of clues point towards similarly complex functionalities for plant ULPs. Characterization of loss-offunction Arabidopsis ULP mutants has implicated the different ULP classes in non-redundant functions during plant development. The esd4 mutant has a pleiotropic phenotype accompanied by early flowering, partially due to SA accumulation (Murtas et al., 2003;Villajuana-Bonequi et al., 2014), while loss-of-function of its closest paralog ELS1 does not display such a drastic phenotype (Hermkes et al., 2011). OTS mutants assume a mild developmental phenotype (smaller and earlyflowering plants), and are also implicated in abiotic and biotic stress resistance (Conti et al., 2008;Bailey et al., 2016;Castro et al., 2016). In contrast, SPF-class mutants are late flowering, and display an altered growth pattern and embryo development defects (Kong et al., 2017;Liu et al., 2017;Castro et al., 2018). The fourth class of ULPs, represented in Arabidopsis by FUG1, is yet to be functionally addressed. Future studies may bring to light additional deSUMOylating protease gene families other than ULPs, adding complexity to the SUMO pathway.
As previously established for non-plant ULPs, different subcellular targeting is an important aspect of ULP molecular function (Hickey et al., 2012;Nayak and Muller, 2014;Kunz et al., 2018). In Arabidopsis, ESD4 interacts with the nuclear pore component NUA, which concentrates its location at the inner nuclear side of the nuclear pore (Xu et al., 2007). In contrast, ELS1 resides in the cytoplasm, which supports low functional redundancy between Class I proteases in Arabidopsis (Hermkes et al., 2011). OTS1, OTS2, SPF1 and SPF2 are nuclear proteins (Conti et al., 2008;Liu et al., 2017;Castro et al., 2018). With the possible exception of the functionally uncharacterized genes ELS2 and FUG1, Arabidopsis ULPs are widely expressed. In classes I and II, there is one ULP that is more expressed than the remaining class members (ESD4 and OTS1, respectively). OTS1 and OTS2 seem to display similar expression patterns but differences in expression amplitude, while SPF1 and SPF2 show differential expression patterns, collectively explaining the existence of unequal functional redundancy in these gene pairs (Castro et al., 2016;Liu et al., 2017;Castro et al., 2018).

Further research on plant deSUMOylating proteases
Our understanding of the functions of deSUMOylation, reviewed more extensively by Benlloch and Lois (2018), is at present very limited. Foremost among future research efforts is determining whether deSUMOylating proteases in general, and ULPs in particular, display a preferential capacity to act as endopeptidases (involved in maturation of preSUMO peptides) or as isopeptidases (removal of SUMOs from SUMO conjugates). Also of significance is the establishment of affinity towards the different SUMO isoforms present in plant genomes, and whether they display capacity to process poly-SUMO chains. Crystal structure and docking studies of catalytic domains are also needed to complement our analysis of proteolytic activity. The over-representation of ULP gene members in plant genomes in comparison with SUMO conjugation components (Augustine et al., 2016;Castro et al., 2018;Garrido et al., 2018), suggests that ULPs are likely to function, to some extent, as sources of specificity within the SUMO pathway. Proteomics strategies to identify large numbers of SUMO conjugates are progressively being introduced in Arabidopsis SUMO research (Budhiraja et al., 2009;Miller et al., 2010;Lopez-Torrejon et al., 2013;Miller et al., 2013;Rytz et al., 2018). Application of these strategies in ULP mutant backgrounds should help us define the target specificity of these proteases.
As we move away from Arabidopsis to non-model plants, it is important to have a clear vision of ULP function and target specificity, but also of gene abundance and the evolutionary pathway of this gene family. Sound and precise nomenclature should provide a beneficial contribution.