Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design.
Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.
Recent advances in molecular modeling have lead to its increasing use in structural biology research for a wide range of applications. The Rosetta biomolecular modeling suite, in particular, has proved effective in many diverse tasks including ab initio structure prediction and homology modeling (Raman et al., 2009), protein and small-molecule docking (Chaudhury and Gray, 2007; Davis and Baker, 2009), loop modeling (Mandell et al., 2009) and design (Kuhlman et al., 2003). To make these protocols more accessible, a number of web-based servers have been constructed, such as Robetta (Chivian et al., 2004), RosettaDock (Lyskov and Gray, 2008) and RosettaAntibody (Sivasubramanian et al., 2008). However, many modeling problems do not fit cleanly into one of the standard Rosetta protocols, and algorithms that combine elements from different methods within Rosetta are often required to adequately model a particular system. Developing such algorithms requires extensive experience in both C++ programming and Rosetta software development, severely limiting its accessibility.
To make custom molecular modeling using Rosetta accessible to a broader community of structural biologists, we developed PyRosetta, a Python-based implementation of the Rosetta molecular modeling suite. Our goal was to enable users to define a molecular modeling problem, design an algorithm to solve it and implement that algorithm on the computer using preexisting Rosetta objects and functions. PyRosetta takes advantage of the object-oriented architecture of the new Rosetta release v3.1 to provide users with easy access to all the major functions and objects used by Rosetta developers (Leaver-Fay,A. et al., manuscript in preparation). PyRosetta can be run in two modes: interactive-mode, which contains tab-completion and help features which are ideal for beginners, and script-mode, which is better suited for algorithm development. We chose Python as the scripting language because it is a sophisticated programming language that enjoys widespread use in the biology community and allows PyRosetta to be compatible with other Python-based packages such as PyMol (DeLano, 2002) and Bio-Python (Cock et al., 2009). Our hope is that the extensive online communities of users of the many Python-based bioinformatics tools will help develop and share interfaces with PyRosetta. Since familiarity with Rosetta objects and functions is essential for new users, a tutorial, user's manual and sample scripts demonstrating usage are available on the web site.
We used a number of tools to convert the classes and functions in the Rosetta C++ source code into a Python-accessible form. GCC-XML (Kitware Inc., 2007) parses the classes and functions of the Rosetta C++ code into an XML representation using the GCC compiler. The Py++ package (Language Binding Project, 2009) uses the GCC-XML objects and generates Python bindings using the Boost.Python library (Boost, 2009). To make this process feasible for over 2000 Rosetta objects, this entire process is automated. The scripts are portable and tested on Mac OSX, Linux and Windows platforms. The building process requires 4–6 h depending on the platform and the pregenerated binary libraries are provided for download for all three platforms. A version of PyRosetta will be made available with each new release of Rosetta along with intermediate versions that add additional features, fix bugs, improve accessibility or expand documentation. In terms of computational cost, PyRosetta performs almost identically to the C++ build of Rosetta with performance benchmarks indicating a <5% difference in speed.
2 ROSETTA APPROACH
Molecular modeling in Rosetta for structure prediction and design relies on the thermodynamic principle that the configuration of a biomolecular system at equilibrium tends toward that which is the lowest in free energy. The free energy of a given configuration (structure and sequence) is approximated using a score function that uses mathematical models of the major biophysical forces (Van der Waals energies, hydrogen bonding, electrostatics, solvation energies etc.) as a function of the configuration. Since it is impossible to exhaustively sample the entire configurational space accessible to the system because of its size and complexity, the starting structures and sampling strategies vary across different modeling applications. Furthermore, different energy scoring components carry different degrees of importance in different modeling applications. The necessity of tailored sampling and scoring strategies underscores the need for a generalized approach to implementing custom molecular modeling algorithms.
Rosetta protocols generally sample the relevant configurational space for a given modeling application by running a large number of relatively short Monte Carlo trajectories starting from random or semi-random starting configurations, storing the lowest energy structures from each trajectory (called decoys), and then selecting lowest energy decoys as predictions. To tackle a wide range of biomolecular modeling problems, it is necessary to precisely define the relevant configurational space for sampling, the search strategy and the score function used for both sampling and identifying the lowest energy structures.
3 PYROSETTA FEATURES
In PyRosetta, a biomolecular system is represented by an object called the
The energy of a structure, or
Sampling functions are written as
In addition to the basic movers, there are movers that execute standard Rosetta protocols. Examples include
PyRosetta is a Python-based implementation of the Rosetta molecular modeling package that enables users to implement molecular modeling algorithms using preexisting Rosetta objects and functions for a range of purposes from simple scripts to sophisticated modeling protocols and run them on the user's own computational resources. PyRosetta is stand-alone package requiring only Python 2.5 to be installed and is currently available for download from the web site (www.pyrosetta.org), along with a user's manual and sample scripts that demonstrate usage. For new users, we have written a set of interactive educational modules available both electronically and in a bound form (Gray et al., 2009). The modules use PyRosetta to lead users from the fundamentals of biomolecular structure and energetics through algorithm creation for applications in structure prediction and design.
In the future, both Rosetta developers and outside users will be able to upload and share scripts with the PyRosetta community through the web site. The features described here are only a small subset of those available. Potential users are referred to the web site for more information.
We acknowledge William Sheffler for developing the first Python-bindings to Rosetta. John D. Bagert and Julian N. Rosenberg assisted in the early development through a Technology Fellowship from the Johns Hopkins University Center for Educational Resources. Finally, we acknowledge the efforts of all the Rosetta developers within RosettaCommons (www.rosettacommons.org) who have contributed the scientific research and software development to Rosetta that has made PyRosetta possible.
Funding: PyRosetta was funded through National Institute of Health (R01-GM73151 and R01-GM078221); National Science Foundation CAREER Grant CBET (0846324).
Conflict of Interest: none declared.