The TMCrys server for supporting crystallization of transmembrane proteins

Varga, Julia K; Tusnády, Gábor E

doi:10.1093/bioinformatics/btz108

Abstract

Motivation

Due to their special properties, the structures of transmembrane proteins are extremely hard to determine. Several methods exist to predict the propensity of successful completion of the structure determination process. However, available predictors incorporate data of any kind of proteins, hence they can hardly differentiate between crystallizable and non-crystallizable membrane proteins.

Results

We implemented a web server to simplify running TMCrys prediction method that was developed specifically to separate crystallizable and non-crystallizable membrane proteins.

Availability and implementation

http://tmcrys.enzim.ttk.mta.hu

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Transmembrane proteins (TMP) play vital roles in the cells acting as gatekeepers and receptors in the cell and organelle membranes. They are frequently targeted by pharmaceuticals: a survey found that more than 50% of marketed drugs interact with TMPs (Hopkins and Groom, 2002). Although the human proteome consists of about 25% TMPs (Dobson et al., 2015), however, of all known protein structures only 2% belong to them (Kozma et al., 2013) and less than a hundred human TMP non-redundant structure is determined (Varga et al., 2017). Knowing the structure of TMPs may aid drug development by providing targets for ligand screening and enabling the creation of models for proteins with unknown structures. However, membrane proteins reside in the cell membrane making the process of structure determination extremely difficult.

In the last 10 years, several prediction methods were developed to enhance the success of structure determination by estimating the chance of successful experiments. Most of them uses the data from TargetTrack (Berman et al., 2009) or its predecessors PepcDB and TargetDB (Chen et al., 2004) and PDB structures (Kouranov et al., 2006). However, almost all of them mix globular and TM proteins leading to predict TMPs as ‘hard to crystallize’ (or somewhat equivalent) without the ability to distinguish between crystallizable and non-crystallizable TMPs. The only TMP-specific method is MEMEX (Martin-Galiano et al., 2007) but being created in 2008, the data used is outdated. We introduced the TMCrys (Varga and Tusnády, 2018) method to aid the process of structure determination of TMPs. Since the algorithm of TMCrys requires installing some libraries and software packages hereby we introduce the TMCrys server, providing a graphical user interface for the prediction via our HPC to facilitate the usage of the method.

2 Materials and methods

2.1 Introduction to TMCrys

Training and test datasets for TMCrys were created using PDBTM and TargetTrack databases as described in (Varga and Tusnády, 2018). Several physical and chemical features describing the sequences were calculated using the topology of the protein, predicted by CCTOP algorithm (Dobson et al., 2015) and other programs (Overton and Barton, 2006; Petersen et al., 2009; Walker, 2005; Xiao et al., 2015). Three XGBoost Decision Trees models were trained to predict the success of purification, solubilization and crystallization, respectively. Finally, a model aggregating the results of the three steps was computed to predict the success of the whole process. The models were evaluated using 10-fold cross-validation and tested on their respective hold-out datasets.

2.2 Reliability of the predictions

Reliability of the prediction was defined as the distance from the threshold of the calculated probabilities, normalized to one:

reliability = \frac{0.85 + (0.15 *abs (probability - threshold))}{threshold},

where threshold for the whole process was 0.85.

2.3 The TMCrys server

TMCrys server was developed using the Laravel web application framework (version 5.5.2) and designed with Bootstrap 3.2.7. Upon submitting a job, the sequences are forwarded to a high-performance computing (HPC) cluster. An Apache Axis server monitors the jobs on the cluster and provides the base of the communication between the HPC and the hosting server. The status of the job and the results are retrieved using SOAP requests. Several programs and scripts are run simultaneously to calculate features for the prediction to speed up the process. The results are sent back to the web server and displayed in HTML format and links are available for the download of the results in XML or tab separated format. Users may provide a job name for the identification of their job and optionally an email address as the results usually takes several minutes to obtain. An overview of the prediction process is provided in Supplementary Figure S1.

3 Results and discussion

3.1 Input

The server accepts input in several formats. Basically, one can submit sequences in FASTA format or space separated format. As the topology of the membrane protein is required for calculating the features, the user is permitted to submit topology of the protein calculated by themselves that should have the same length as the sequence and can contain the following labels: ‘I’ for inside, ‘M’ for membrane, ‘O’ for outside, ‘L’ for re-entrant loops and ‘S’ for signal peptide. Since the final prediction depends on the topology provided, the user submitted topology might influence the final results. To avoid server overload, maximum 10 sequences can be submitted as one job. The sequences can also be uploaded in a single file.

3.2 Output

Three typical HTML outputs can be seen on Supplementary Figure S2. The server generates HTML output for all query proteins in the following format. A query protein appears in an expendable panel. The color of the panel gives information about the protein being membrane or non-membrane, the latter indicated with a yellow panel and ‘non-TMP’ label (Supplementary Fig. S2C). When the protein was predicted to be membrane protein by CCTOP (or a topology was provided), a green or a red panel appears indicating whether the protein was predicted to be crystallizable (Supplementary Fig. S2A) or non-crystallizable (Supplementary Fig. S2B), respectively.

The predicted outputs are provided in numerical formats as well as a slider diagram, together with the reliability of the prediction. Besides the sequence and the topology of the query, similar entries from TargetTrack and TSTMP databases—generated by simple blast search—are also listed. The former ones aid the process by providing TargetTrack IDs of similar experiments already performed. The TSTMP is a database that collects human membrane proteins with existing structures that can be used for modeling the query protein (group 3D), membrane proteins that can be modeled (group Modelable) and proteins without existing structure or model (group ‘Target’). These latter proteins would become modelable if the structure of the query protein was solved. Last, some of the calculated features are also displayed, like instability index or average solvent accessible surface area.

The outputs can be downloaded in XML and tab-separated format, displaying all the above described features and outputs.

3.3 Direct interface

To enable programmatic access to TMCrys server a direct interface was established as well. The user can submit one sequence at a time with an ID and can monitor the progress of the job by calling a polling interface. The results can be downloaded in both tab or XML formats. A template script developed in Python, that can process multiprotein FASTA files, is also provided on the server.

Acknowledgement

We thank László Dobson for creating the script for accessing the direct interface.

Funding

This work was supported by the Hungarian Scientific Research Fund [grant number K119287 and K125607]; ‘Momentum’ Program of the Hungarian Academy of Sciences [grant number LP2012/35]; National Research, Development and Innovation Fund of Hungary [grant number FIEK_16–1–2016–0005]; and grant of the New National Excellence Programme by the Ministry of Human Resources [grant number ÚNKP-16–2_VBK-016]. Funding for open access charge: LP2012/35.

Conflict of Interest: none declared.

References

Berman

H.M.

et al. . (

2009

)

The protein structure initiative structural genomics knowledgebase

.

Nucleic Acids Res.

,

37

,

D365

–

D368

.

Chen

L.

et al. . (

2004

)

TargetDB: a target registration database for structural genomics projects

.

Bioinformatics

,

20

,

2860

–

2862

.

Dobson

L.

et al. . (

2015

)

The human transmembrane proteome

.

Biol. Direct.

,

10

,

31

.

Hopkins

A.L.

,

Groom

C.R.

(

2002

)

The druggable genome

.

Nat. Rev. Drug Discov.

,

1

,

727

–

730

.

Kouranov

A.

et al. . (

2006

)

The RCSB PDB information portal for structural genomics

.

Nucleic Acids Res.

,

34

,

D302

–

D305

.

Kozma

D.

et al. . (

2013

)

PDBTM: Protein Data Bank of transmembrane proteins after 8 years

.

Nucleic Acids Res.

,

41

,

D524

–

D529

.

Martin-Galiano

A.J.

et al. . (

2007

)

Predicting experimental properties of integral membrane proteins by a naive Bayes approach

.

Proteins Struct. Funct. Genet.

,

70

,

1243

–

1256

.

Google Scholar

Crossref

WorldCat

Overton

I.M.

,

Barton

G.J.

(

2006

)

A normalised scale for structural genomics target ranking: the OB-Score

.

FEBS Lett.

,

580

,

4005

–

4009

.

Petersen

B.

et al. . (

2009

)

A generic method for assignment of reliability scores applied to solvent accessibility predictions

.

BMC Struct. Biol.

,

9

,

51

.

Varga

J.

et al. . (

2017

)

TSTMP: target selection for structural genomics of human transmembrane proteins

.

Nucleic Acids Res.

,

45

,

D325

–

D330

.

Varga

J.K.

,

Tusnády

G.E.

(

2018

)

TMCrys: predict propensity of success for transmembrane protein crystallization. Bioinformatics

.

OpenURL Placeholder Text

WorldCat

Walker

J.M.

(ed.) (

2005

)

The Proteomics Protocols Handbook

.

Humana Press

,

Totowa, NJ

.

Xiao

N.

et al. . (

2015

)

protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences

.

Bioinformatics

,

31

,

1857

–

1859

.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Associate Editor:

Download all slides

Month:	Total Views:
February 2019	29
March 2019	94
April 2019	53
May 2019	35
June 2019	18
July 2019	12
August 2019	11
September 2019	22
October 2019	149
November 2019	45
December 2019	31
January 2020	24
February 2020	15
March 2020	24
April 2020	12
May 2020	12
June 2020	12
July 2020	15
August 2020	9
September 2020	22
October 2020	21
November 2020	6
December 2020	10
January 2021	6
February 2021	4
March 2021	9
April 2021	13
May 2021	8
June 2021	13
July 2021	14
August 2021	12
September 2021	9
October 2021	7
November 2021	5
December 2021	7
January 2022	21
February 2022	21
March 2022	10
April 2022	18
May 2022	9
June 2022	5
July 2022	16
August 2022	25
September 2022	26
October 2022	10
November 2022	24
December 2022	3
January 2023	5
February 2023	4
March 2023	10
April 2023	9
May 2023	4
June 2023	8
July 2023	4
August 2023	9
September 2023	3
October 2023	3
November 2023	9
December 2023	14
January 2024	7
February 2024	16
March 2024	15
April 2024	19

Article Contents

The TMCrys server for supporting crystallization of transmembrane proteins

Abstract

1 Introduction

2 Materials and methods

2.1 Introduction to TMCrys

2.2 Reliability of the predictions

2.3 The TMCrys server

3 Results and discussion

3.1 Input

3.2 Output

3.3 Direct interface

Acknowledgement

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

The TMCrys server for supporting crystallization of transmembrane proteins

Abstract

1 Introduction

2 Materials and methods

2.1 Introduction to TMCrys

2.2 Reliability of the predictions

2.3 The TMCrys server

3 Results and discussion

3.1 Input

3.2 Output

3.3 Direct interface

Acknowledgement

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only