Evolutionary conservation of Ebola virus proteins predicts important functions at residue level

Abstract Motivation The recent outbreak of Ebola virus disease (EVD) resulted in a large number of human deaths. Due to this devastation, the Ebola virus has attracted renewed interest as model for virus evolution. Recent literature on Ebola virus (EBOV) has contributed substantially to our understanding of the underlying genetics and its scope with reference to the 2014 outbreak. But no study yet, has focused on the conservation patterns of EBOV proteins. Results We analyzed the evolution of functional regions of EBOV and highlight the function of conserved residues in protein activities. We apply an array of computational tools to dissect the functions of EBOV proteins in detail: (i) protein sequence conservation, (ii) protein–protein interactome analysis, (iii) structural modeling and (iv) kinase prediction. Our results suggest the presence of novel post-translational modifications in EBOV proteins and their role in the modulation of protein functions and protein interactions. Moreover, on the basis of the presence of ATM recognition motifs in all EBOV proteins we postulate a role of DNA damage response pathways and ATM kinase in EVD. The ATM kinase is put forward, for further evaluation, as novel potential therapeutic target. Availability and Implementation http://www.biw.kuleuven.be/CSB/EBOV-PTMs Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Ebola Virus (EBOV) is a virus in the family of Filoviridae and causes severe hemorrhagic fever known as Ebola Virus Disease (EVD) with a mortality rate of up to 90%. EBOV is a rather small pathogen with only seven genes. The recent outbreak with epicenter in Guinea has sparked a number of new genomics studies (Gire et al., 2014) that showed the virus proteins are not undergoing rapid evolution (Hoenen et al., 2015) and mutations in different EBOV proteins are correlated with lethality (Deng et al., 2015). These studies did not address highly conserved parts of proteins that we think may provide structural and functional insight into molecular functions of EBOV proteins.
There is no suitable medicine commercially available to date for EVD. Protein vaccines have failed to produce desired results in human subjects (Ponomarenko et al., 2014). Recently, host calcium channels were shown to be involved in virus host-cell entry (Sakurai et al., 2015) and were put forward as novel drug target. Host kinases also play a role in EVD, such as the PI3K kinase and PI3K/Akt pathways in EBOV proliferation (Saeed et al., 2008). The interaction of ABL1 kinase with matrix protein VP40 has been linked to EBOV endocytosis (Garcia et al., 2012). This association might facilitate morphogenesis and budding of new virions.
Different pathways and signaling cascades are routinely employed by viruses for infectivity (Lilley et al., 2007). An example is the manipulation of Ras/MAPK pathways via IFN response that is associated with the elevation of EBOV infection in mice (Strong et al., 2008). The activation of Mitogen associated protein kinase (MAPK) pathways like p38 pathway including DNA damage response are dependent on ATM (Munshi and Ramesh, 2013). In order to gain more insight into the molecular biology of EVD we here aim to identify evolutionary conserved residues and assign functions to them. We do this by (i) collecting known and predicting novel post-translational modifications on EBOV proteins, (ii) mapping conserved residues on to three-dimensional structures, (iii) collect protein-protein interactions between virus and host proteins, (iv) find motifs that may mediate proteinprotein interactions and (v) map conserved and modified residues to interaction interfaces.

Methods
We calculated residue level conservation for each Ebola protein based on 520 unique EBOV genomes. We annotated these with Post-translational Modifications and motifs and mapped them to known and modeled 3D-structures. For more details see Supplementary Methods.

Results
To identify important functions of individual residues in EBOV proteins, we combined residue conservation with several computational predictions and existing data.

Molecular functions of conserved residues
We created multiple sequences alignments for all EBOV proteins based on whole genome sequencing data of 520 EBOV genomes and identified highly conserved regions. Next, we annotated molecular functions to those conserved regions. Post-translational modifications (PTMs) are known for their role in protein activities and it has been shown that modified residues are more conserved than unmodified residues (Minguez et al., 2012). We collected published data on modified residues in Ebola proteins. Only few PTMs have been characterized so far including thirteen phosphorylation sites; thus we extended the PTM repertoire by predicting novel posttranslational modifications. As EBOV proteins reside in the human host cell, we assume human modifying enzymes can modify them. We employed widely used neural network and SVM based published prediction methods trained on human PTM data to predict modified residues on EBOV proteins. We validated this strategy against published phosphorylation data; the prediction methods were able to predict eight out of thirteen published phosphorylation sites in EBOV (Supplementary Table S1). For further analyses we included modified residues in our final repertoire that had highest prediction value and all previously published PTMs.
The new list of PTMs is composed of 10 different types and 88 conserved PTM carrying amino acid residues in seven EBOV proteins ( Fig. 1, Supplementary Table S2). Among predicted PTMs in EBOV proteins, phosphorylation is the most prevalent PTM type with 42 sites followed by methylation and o-glycosylation. Some of the PTMs are present in functional domains of EBOV proteins (Supplementary Table S2). Modified residues present at protein interaction interfaces may be important to virus proteins as immune escape is achieved through interaction with different host proteins (Fig. 2). The presence of different PTM-types on EBOV proteins like methylation and phosphorylation in glycoprotein (GP) can explain previously missing multiplicity of EBOV protein functions and regulation. Although previously, no specific role could be found for GP lipidation (Ito et al., 2001), the conservation level suggests a crucial role, potentially in binding GP to lipid rafts together with the newly predicted conserved GPI-anchor site. The conserved SUMOylated residues in L-Protein could be responsible for nuclear localization for this protein.
Interestingly, for some of the modification types no sites were predicted at all, for example we did not predict any ubiquitination site for any of EBOV proteins. The simple explanation for that may be that virus proteins may avoid degradation and thus lack sequences that resemble ubiquitination signals.

Conserved residues in 3D structures
We map conservation from multiple sequence alignments of EBOV proteins to their protein tertiary structures (Celniker et al., 2013). The tertiary structures of proteins were retrieved from PDB (Berman et al., 2000). Only VP24 has a completely resolved protein structure whereas other EBOV proteins have partial structures either alone or in complex with other proteins in PDB (except for L protein). In order to evaluate the conservation of complete proteins in different structural domains, we extended our analyses to model complete EBOV proteins. We used a knowledge-based computational method to predict and refine the finished protein structures (Yang et al., 2014;Zhang 2009; details see Supplementary Methods). EBOV proteins contain a variety of domains of which some are completely conserved and others are completely variable (Fig. 1, Supplementary  Fig. S1). For example in Glycoprotein the unstructured regions have very low sequence conservation, whereas the helical parts are highly conserved. We annotate the known and predicted PTMs to conserved residues. This results in 14 modified residues present in known EBOV protein structures (Supplementary Table S3) and 88 if modeled structures are included (Supplementary Table S2).

Short motifs predict interactions with host proteins
Another approach to obtain mechanistic insight into the virus protein functions is based on finding short linear motifs. We employ Gibbs sampling (Davey et al., 2010) and identify [ST]N.L.
[FIV] known as Dok1 PTB domain binding motif in all EBOV proteins except for VP30 and twice in VP40. Dok proteins are adaptor molecules known for their role in regulation of signal transduction (Zhang et al., 2004). Conserved PTB domains are found in many proteins and implicated for protein-protein interactions (Zhang et al., 2004). Dok1 protein is known to bind through its PTB domain with protein kinases including Abelson tyrosine kinase (ABL1) (Cong et al., 1999). ATM activates ABL1 (Shafman et al., 1997) and ABL1 kinase phosphorylates many targets to facilitate cellular responses like intracellular mobility and A second motif NPG.C was only found in VP30, VP35 and L protein. This motif is called phosphotyrosine independent PTB domain motif and binds Dab-like PTB domains and functionally takes part in signaling pathways like endocytosis.

Modified residues at protein interaction interfaces
EBOV proteins interact with other virus proteins for spreading and with human proteins to evade host immunity or human proteins interact with EBOV proteins to accomplish immunity. To inspect the major interactions of EBOV proteins with host proteins or other virus proteins, we reconstructed the EBOV protein interactome based on literature review (Fig. 2, Supplementary Table S4). We hypothesize that the residues present at the interface should be more conserved and play a functional role in protein-protein interactions. We identified conserved residues with annotated PTMs in protein interaction interfaces of known virus-host and virus-virus protein complexes from PDB (Table 1). Three modified sites for glycoprotein are present in the interface with neutralizing antibody, one such site in the interface between NP-VP35, one modified residue at the interface of complex with VP40-Nedd4, one modified residue between VP24 and KPNA1 and one phosphosite in the same interface motif of two different 3D structures of the VP35 inhibitory domain with dsRNA (Supplementary Table S1). The presence of these predicted modifications at conserved residues of interface of a complex between two proteins suggest an important role of PTMs in EBOV protein complex formation.

ATMkinase central for EBOV functions
We analyzed which kinases could be involved in the phosphorylation events of EBOV using published SVM based methods (Wong et al., 2007). We identify three potential human kinases, including ATM (ataxia telangiectasia mutated), GSK3 (glycogen synthase 3 kinase) and CK2 (casein kinase 2). The ATM kinase could be linked to the modification of the majority of phosphorylation sites ( Table 2). The ATM kinase is a major regulator of signaling pathways like DNA damage response (DDR) pathway (Supplementary Tables S6, S7) and interacts with other important kinases and transcription factors such as ABL1, ATR, AKT1, PI3K, p53 and MDM2 (van der Lee et al., 2014;Shafman et al., 1997;Shiloh and Ziv, 2013) (Fig. 3). ABL1 is a master regulator during the cytoplasmic mobility and ABL1 itself is  Modified residues overlap with residues in interaction interfaces in experimentally resolved 3D structures. Modified residue 272 of VP35 is next to the residue 271 that is part of an interaction interface.  (Shafman et al., 1997) 6. (Shiloh and Ziv, 2013) 7. (Saeed et al., 2008) phosphorylated by ATM kinase. Tyrosine phosphorylation of VP40 has been shown to be essential for Virion egress (Fig. 3, Garcia et al., 2012). The p53 and MDM2 are integral part of apoptosis, cell cycle and PI3K/Akt signaling pathways. EBOV uses PI3K signaling for cellular entry (Saeed et al., 2008). Finally, ATM has been shown to regulate wnt signaling (Svegliati et al., 2014), both CK2 and GSK-3 are part of the wnt signaling pathway (Seldin et al., 2005). Many viruses exploit DDR pathways for their own survival inside the host (reviewed elsewhere (Lilley et al., 2007). Based on the presence of ATM Kinase motifs in EBOV proteins, we now suggest the activation of DDR pathways by EBOV as well (Fig. 3).
In this study we have identified highly conserved residues in EBOV proteins and explored their functional attributes by analysis of post-translational modifications, protein-protein interactions and linear motifs. The link with ATM Kinase motifs suggests the activation of DDR pathways in the presence of EBOV may trigger the possible ATM dependent phosphorylation cascades. Together these data suggest the potential of ATM as an interesting therapeutic target to be explored in the context of EVD.

Funding
This work has been supported by the KU Leuven Research fund.