-
PDF
- Split View
-
Views
-
Cite
Cite
Francesco Pappalardo, Mark D. Halling-Brown, Nicolas Rapin, Ping Zhang, Davide Alemani, Andrew Emerson, Paola Paci, Patrice Duroux, Marzio Pennisi, Arianna Palladini, Olivo Miotto, Daniel Churchill, Elda Rossi, Adrian J. Shepherd, David S. Moss, Filippo Castiglione, Massimo Bernaschi, Marie-Paule Lefranc, Søren Brunak, Santo Motta, Pier-Luigi Lollini, Kaye E. Basford, Vladimir Brusic, ImmunoGrid, an integrative environment for large-scale simulation of the immune system for vaccine discovery, design and optimization, Briefings in Bioinformatics, Volume 10, Issue 3, May 2009, Pages 330–340, https://doi.org/10.1093/bib/bbp014
- Share Icon Share
Abstract
Vaccine research is a combinatorial science requiring computational analysis of vaccine components, formulations and optimization. We have developed a framework that combines computational tools for the study of immune function and vaccine development. This framework, named ImmunoGrid combines conceptual models of the immune system, models of antigen processing and presentation, system-level models of the immune system, Grid computing, and database technology to facilitate discovery, formulation and optimization of vaccines. ImmunoGrid modules share common conceptual models and ontologies. The ImmunoGrid portal offers access to educational simulators where previously defined cases can be displayed, and to research simulators that allow the development of new, or tuning of existing, computational models. The portal is accessible at <igrid-ext.cryst.bbk.ac.uk/immunogrid>.
INTRODUCTION
Vaccination is an immunologic intervention that produced the largest single positive impact on control of infectious disease and saving of human lives over the last 200 years. Some 50 successful vaccines have been developed during this period to control almost 30 infectious diseases [1]. Two major classes of vaccinations exist: prophylactic vaccines to prevent future infections or diminish their effects, and therapeutic vaccines to treat established disease. Advances in vaccine development come in waves produced by technological advances. The historic technological breakthroughs in the vaccine field include pathogen attenuation, pathogen inactivation, cell culture of viruses, genetic engineering and induction of cellular immunity. The current biotechnological revolution brings combination vaccines, advanced adjuvants, genomics and proteomics, nanotechnology-based delivery systems, immunomodulation, rational vaccine design and computational simulations to improve our understanding of both the makings of human organism and immune system and of pathogens [2, 3]. These latest technological advances also offer great hope for immunologic control of noninfectious disease including cancer, autoimmunity, allergies and post-transplantation complications. These interventions are specially promising in cases where alternatives (such as surgery, chemotherapy or others) are limited, ineffective, or non-existent [4].
Unfortunately, for most diseases variation in host and pathogens make a universal vaccine a remote possibility [5]. Thanks to advances in genomics and sequencing of entire genomes of multiple strains of pathogenic microorganisms, the issue of pathogen diversity can be addressed by targeting pan-genomes or total gene repertoire for a given microorganism [6]. The diversity of the human immune system is enormous; the number of different products of the immune system is five orders of magnitude larger than some 106 non-immune system products encoded by the human genome. Vaccines must contain at minimum two antigenic epitopes: one to induce specific B cell or cytotoxic T-cell responses, and one to provide T-cell help [7]. Broadly protective vaccines, against multiple strains of pathogen require cocktails of target antigens or epitopes [8, 9]. In addition, immunomodulatory components (adjuvants and various suppressors and enhancers) are added to enable desired immune responses. The development of combinations of adjuvant and immunomodulatory components requires considering the route and kinetics of immunization; the resulting combinatorial space of possible adjuvants is huge [10]. The efficiency of vaccines depends on multiple factors: components and formulation, the administration route [11], vaccine delivery systems [12, 13], the dose of vaccine [14], the number of vaccinations and vaccination schedules [15]. Vaccination induces a broad spectrum of regulatory cytokine responses [16]. The typical effect of vaccination is the promotion of protective immune responses; however vaccines are also known to induce immunosuppression [17]. Therefore, the combination of bioinformatics and high-throughput immunologic assays is essential for screening potential vaccine targets [4].
The large number of possible vaccine formulations makes computational and biomathematical approaches essential for projects aiming to perform systematic exploration and optimization of vaccine formulations. We have developed a framework that combines computational tools for the study of immune function and vaccine development with experimental validation. This framework, named ImmunoGrid combines conceptual models of the immune system, models of antigen processing and presentation, system-level immune system models, Grid computing, and database technology to facilitate vaccine discovery, formulation and optimization (Figure 1). Currently ImmunoGrid has selected simulators of infection, cancer and atherosclerosis, and allows simulation of antigen interactions in lymph nodes. ImmunoGrid is an example of collaborative environment in which well defined modeling and simulation tools are integrated with the expansive computational infrastructure. In future, ImmunoGrid will include models, simulators and tools that address various further aspects of the immune system. In principle there is no limit as to what models can be included. Thanks to Grid infrastructure these tools can be distributed across different physical sources while the portal serves only as the common interface. The criteria for inclusion of tools into the ImmunoGrid framework are the accuracy and the utility of the tools, and compatibility with the conceptual framework.

The components of ImmunoGrid and their relationships. There are five principal groups of components. The first three groups define the scientific background of the ImmunoGrid framework: conceptual background, individual components, models and applications. The fourth group defines the engineering background and the simulator of the immune system. The fifth group defines the means for reaching the broader community. The directed arrows in the figure show dependencies. For example concepts, standards and validation determine contexts. Concepts define standards, and standards are then used to define validations to be performed. Concepts and standards are used to define specific components, namely molecular and system models. Physical models such as tumor-susceptible mouse are external to ImmunoGrid and the observed experimental results are used to refine the concepts (the feedback arrow shown). Molecular- and system-level models can be descriptive or predictive and they are used for vaccine applications. These models and applications are implemented as Grid applications or databases and require integration of components. The results are then distributed to the researchers and the community.
CONCEPTS, STANDARDS AND ONTOLOGIES
Standardized rules and formal concepts for the identification, description and classification of biological components and processes are used in the modeling of the Virtual Immune System (VIS) as a part of the Virtual Physiological Human (VPH). The VPH is a framework of methods and technologies that aim to enable the investigation of the human body as a single complex system [18]. Unique numbering and controlled vocabularies (ontologies) have been developed for the formal description of antigen receptors including antibodies or immunoglobulins (IG), T-cell receptors (TR) and the major histocompatibility complex (MHC). Human MHC is known as human leukocyte antigen (HLA). The formal definition is necessary for capturing the incredible diversity of antigen receptors amongst individuals (1012 IG, 1012 TR, or >1020 possible combinations of HLA per person). These definitions (rules) are crucial for a standardized analysis of the interactions between receptors and ligands and between proteins and are used for molecular-level modes and system-level simulators. The concept definitions and ontologies are largely based on IMGT® [19, 20], which has become a widely accepted standard in immunology. IMGT focuses on antigen receptors and MHC at the molecular level and provides the standardization for immunogenetics data from genome, proteome, genetics and 3D structures [21–24]. The ImmunoGrid extensions, currently under development, provide the definitions of immune concepts at the cellular, organ and organism levels, and further provide formal definitions of various immune pathologies.
ImmunoGrid is a complex environment containing multiple development, application and dissemination components (Figure 1). Each of the components is connected directly, or indirectly, to the ‘Concepts’ and ‘Standards’ components, ensuring that a common conceptual model, terminology and standards are used.
MOLECULAR-LEVEL MODELS
A large body of mathematical and computational-modeling work has been done on modeling the adaptive immune system [25], but less so on modeling innate immunity. The adaptive immune system acts through its two arms: humoral immunity, mediated by antibodies and cellular immunity mediated by T cells. Antibodies, which are produced by B cells, neutralize pathogens and antigens outside the cells. T cells neutralize intracellular pathogens and cancers by eliminating infected or malfunctioning cells and they also provide regulatory help for both humoral and cellular immunity. The enormous repertoires of both B- and T-cell receptors (IG and TR) provide means for recognition, immune activation and acquiring of memory against disease. B cell epitopes are 3D shapes on the antigen surface, mostly discontinuous parts of antigen sequences, recognized by antibodies. Molecular mechanisms involved in the IG synthesis result in a large diversity of B cell clones that are selected for specific antigens and mature into plasma cells producing highly-specific antibodies. T-cell epitopes are short fragments of antigenic peptides that are produced by antigen processing and presentation pathways. Cytotoxic CD8+ T-cells (CTLs) recognize and target cells that display foreign T-cell epitopes presented by class I pMHC complexes. Helper CD4+ T cells recognize class II pMHC complexes and provide regulatory signals (T-cell help) needed for activation of B cells and CTLs.
The combinatorial complexity of antigen processing and presentation makes the prediction and analysis of MHC-binding peptides and T-cell epitopes a problem suitable for large-scale screening and computational analysis. Vigorous research and development has produced a large number of computational models of antigen processing and presentation during the last 15 years. A large number of MHC ligands and T-cell epitopes have been stored in specialist databases [26–29]. A number of techniques, including motifs, quantitative matrices, artificial neural networks, hidden Markov models, support vector machines, molecular modeling and others have been developed [30, 31] and deployed as web servers. The antigenic epitope prediction, or ‘molecular-level models’, used in ImmunoGrid are principally the CBS tools [32–37], complemented by several other predictive models [38–40]. The predictive models and the modes of usage are judiciously selected using exhaustive validation with carefully selected experimental data. Because ImmunoGrid focuses on practical applicability to vaccine research, the models are applied in a manner that best suits vaccine discovery and the validation process (as shown in an example in Figure 2). Using this approach, we have analyzed 40 000 influenza proteins for peptides that bind to some 50 HLA alleles.

An example of integration of molecular-level models for discovery of target T-cell epitopes. A set of molecular models (multiple HLA variants) are used to analyze a complete set of viral antigens. Predictions are performed using Grid computing, to accommodate a large numbers of prediction jobs. Predicted vaccine targets are then stored in a relational database as annotated entries. Subsequent user requests for analysis of candidate T-cell epitopes from one, several, or all antigens can be performed by data-mining of the database of predicted targets.
The testing and assessment of these models shows that the predictions of peptide binding to HLA class I molecules are of high accuracy and therefore directly applicable to identification of vaccine targets [41–43]. Furthermore, we have shown that these models are stable and their performance can be reproduced across several major HLA class I variants [44].
Predictions of MHC-II class II ligands and T-cell epitopes are more complex: pMHC-II-binding predictions are of much lower accuracy than those of pMHC-I [45, 46]. The accuracy of predictions can be improved by using genetic algorithms [47], Gibbs sampling [48] or by combining predictions from multiple predictors [49]. For the time being, HLA class I predictions can be used for prediction of positives (binders and T-cell epitopes), while HLA class II predictions can be used for elimination of obvious negatives (non-binders, non-T-cell epitopes). Preliminary results from the combined analysis of computational prediction, that lead to selection of peptides used for T-cell-activation experiments, together with measurement of cytokine profiles from experimental validation studies of viral proteomes indicate that the concept of ‘T-cell epitope’ can not be linked to MHC binding only but rather to a larger panel of biological phenomena, including peptide processing, their availability in various tissues, T-cell repertoires, and others.
SYSTEM-LEVEL MODELS
Our knowledge of the immune system is incomplete and imprecise. Mathematical modeling provides a formal description of the underlying principles and organization of the immune system and the relationships between its components. The immune system is a distributed system lacking central control, but it performs complex functions efficiently and effectively [50]. Mathematical modeling of the immune system requires understanding both the properties of types of mathematical models involved and understanding the source data used for modeling. These methodologies have strengths and limitations that need to be understood for effective development of mathematical models that use these data. For example the source data often need transformations (normalization, filtering, or other pre-processing) before it can be used for model development. Mathematical models of the immune system evolved from classical models using differential equations, difference equations, or cellular automata to model a small number of interactions, molecules, or cell types involved in immune responses. Principal types of mathematical models for immunological applications are described in [51, 52]. The key enabling technologies of genomics [53], proteomics [54, 55], bioinformatics [56] and systems biology (including such genomic pathway analysis) [57] in immunology have provided large quantities of data describing molecular profiles of both physiological and pathological states. Assays for immune monitoring (multiparametric flow cytometry, nanotechnology for quantitation of cytokine production, ELISPOT, intra-cytoplasmic cytokine staining and mRNA/micro-RNA-based assays) keep improving and expanding our ability to measure various immunoregulatory and modulatory molecules [58]. Laser scanning cytometry enables the measurement and analysis of effector function of individual cells in situ and thus quantification of molecular and cellular events in physiological and pathological states [59]. These emerging technologies provide data that previously were not available and that enable more detailed modeling of immune processes.
Mathematical modeling of the immune system has grown to include the extensive use of simulations [51] and iterative refinement of the immune system models at both molecular [60] and system level [61]. ImmunoGrid system-level models are based on the cellular automata model of the immune system originally developed by Celada and Seiden [62] and spatial physical models using partial differential equations to describe lymph nodes, chemotaxis, cell movement and diffusion. The two key models of the immune system are C-ImmSim [63] and SimTriplex [15]. These models include components of both adaptive and innate immunity and the core function for both models is the modeling of adaptive immunity (both humoral and cellular arms are included). C-ImmSim is a generic model and it has been applied for the modeling of: primary and secondary immune responses, bacterial infection and viral infection. It has been applied to the descriptive studies of HIV infection [64, 65], Epstein-Barr virus infection [66] and cancer immunotherapy [67]. The first version of SimTriplex model was derived from C-ImmSim with a focus on predictive modeling, with the later versions evolving independently. The SimTriplex model has been applied to predictive modeling of immunoprevention vaccine [15, 61, 68, 69] and more recently to the descriptive modeling of atherosclerosis [70]. The most recent developments of models for inclusion in ImmunoGrid are physical models of tumor growth based on nutrient or oxygen starvation (based on the lattice Boltzmann method [71, 72]). These models have enabled the simulation of tumor growth for both benign and malignant tumors [manuscript in preparation]. ImmunoGrid also has a lymph node simulator offering a mechanistic view to the interactions of antigen and the immune system cells in the lymph node.
In ImmunoGrid, our main concern is the practical applicability of computational models to vaccine discovery, formulation, optimization and scheduling. The models must, therefore, be both biologically realistic as well as mathematically tractable [73]. The models used in ImmunoGrid are validated experimentally and incrementally refined (tuned). Specific applications that have been investigated include the study of mammary cancer immunoprevention; therapeutic approaches to melanoma and lung cancer; immunization with influenza epitopes; the study of HIV infection and HAART therapy; and modeling atherosclerosis.
In ImmunoGrid we use predictive models that are carefully validated, and are refined iteratively (predict-test-refine) using selected experiments. Modeling of immunoprevention vaccine against mammary carcinoma in genetically susceptible mice has been validated experimentally and shown to accurately reproduce immune responses for up to 52 weeks of age [15, 61, 68, 69] while additional model tuning is needed for later stages of the disease. The vaccination data indicate that for the older (e.g. second year of mouse life) immune system, the rate of immune responses are lower than in the younger (e.g. first year of mouse life) immune system (P.L. Lollini, unpublished data) and, accordingly, the model needs additional tuning for older mice. Modeling of HIV-1 infection in untreated as well as patients receiving HAART has been tuned against data from literature as well as clinical observations [64, 65]. CBS tools have been assessed as best performing prediction systems for HLA binders and T-cell epitopes, specific details can be found in [43, 45].
The model for atherogenesis is descriptive and it was tuned to match published experimental data—the detailed descriptions are available in [70]. The descriptive simulator of a lymph node reproduces experimental data published in [74–76].
INTEGRATED SYSTEM AND GRID COMPUTING
Integrated systems for data management and laboratory automation are emerging as cyber infrastructure that is the new research environment that enables research using Internet resources [77, 78]. These systems mainly deal with the management and analysis of data addressing the bottleneck due to the huge and rapidly growing quantities of data and the need to automate the process of multi-step large-scale screening. Continuous developments of information and communication technologies and computational intelligence [79] have led to the concept of ‘Virtual Laboratory’. The Virtual Laboratory environment offers an integrated intelligent environment for systematic production of high-dimensional quality assured data, as opposed to the common approach where independent exploratory studies are typically combined in an ad hoc manner [80]. The ImmunoGrid environment provides a number of modules that can be suitably combined for addressing specific vaccine questions; these include Grid computing for large-scale tasks and sharing distributed data. Each vaccine study performed within the ImmunoGrid framework involves: conceptual modeling of the disease/pathology, followed by utilization of models for simulation of the immune system and processes. Descriptive models are tuned to the available data sets, while predictive models are developed incrementally, starting from simple solutions which are incrementally validated and expanded. The ImmunoGrid framework enables integration of molecular- and system-level models for vaccine research, an example is shown in Figure 3.

An example of integration of system-level models for discovery, formulation, or optimization of vaccines. First, a tumor sample is profiled for presence of tumor antigens and the profile of tumor microenvironment, including profiles of immunoregulatory molecules. T-cell epitopes are predicted using molecular-level models. A small number of experiments are followed by further computational simulations. The ‘Multiplex’ refers to the analysis of multiple components in the vaccines (SimTriplex contains three components). Massive scale simulations can be performed using Grid computing and the simulations can be stored in a database for future analysis. The iterative process of initial experiments → simulation → validation, ultimately leading to a vaccine can be used for formulation and optimization of formulation, dosage and scheduling. This approach has been used for optimization of immunoprevention vaccine scheduling using SimTriplex in mouse models of mammary cancer. Molecular-level models are linked with system-level models through predicted T-cell epitopes.
In ImmunoGrid, the meaning of the Grid is primarily that of a ‘virtual organization’, implying access to heterogeneous resources and administrative resources through a single point of access [81]. The main requirements for the ImmunoGrid simulator are processing power, data federation, visualization of results and user portals. The initial benchmarking of ImmunoGrid simulations showed that large-scale simulations of 1600 vaccine schedules in 100 virtual mice (representing 24 months of mouse life) can be accomplished within 26 h of wall time (time difference between job submission and receiving the results). Modeling mouse immune responses is a step towards modeling human immune system. Overall, mouse system-level models are very similar to the human immune system, with an added advantage that experiments can be performed readily; they are normally used for vaccine research and also they are used as benchmarks for bioinformatics-driven vaccine research. ImmunoGrid development is based on the analysis of HLA as well as scaling-up of the mouse ImmunoGrid developments. In ImmunoGrid, Grid computing, therefore, enables robust and scalable solutions for immune system modeling.
Each model within ImmunoGrid simulator has integrated a set of user notes, accessible by clicking on the question marks. Also, tutorial notes are available on the web site.
EDUCATIONAL AND RESEARCH ImmunoGrid PORTAL
The ImmunoGrid simulator can be accessed through the prototype portal web site [82]. The main aims of the ImmunoGrid simulator are to: provide tools to enhance our understanding of the immune system; support vaccine design, formulation and optimization; provide an interactive educational model of the human immune system; and apply computer modeling and Grid infrastructure to complex scientific applications. The modular architecture of ImmunoGrid enables content hierarchy and reusability of modules. Educational simulators are based on learning objects [83] and currently include the following simulation modules: cancer vaccine scheduling, antigen processing and presentation and bacterial replication rates. Learning objects enable students to select multiple scenarios and observe the effects of changing input to the outcome. This strategy promotes student-centered engagements where students learn through engagements such as querying and problem-solving. The learning objects provide information in highly visual and interactive formats supporting deep learning. Educational ImmunoGrid will be expanded to include the analysis of T-cell epitopes and immunological hot-spots, tumor growth, tumor regression and atherosclerosis, among others.
The access to the research portal is currently restricted to the ImmunoGrid consortium members and to selected projects. Projects of a very large scale, such as prediction of peptide binders for several hundred HLA alleles within 40 000 influenza sequences, include several steps: pre-calculation of results using Grid infrastructure, storage of results in the database and access to the results by interrogating the database. Smaller-scale projects produce results by direct access to computational resources. Educational simulators are developed and implemented from representative examples from earlier projects. The ImmunoGrid portal is accessible at <igrid-ext.cryst.bbk.ac.uk/immunogrid>. At this time the educational simulator contains selected examples of simulations of cancer vaccine scheduling, cancer growth, viral infection (influenza, HIV and EBV), atherosclerosis simulator, a mechanistic lymph node model and antigen processing and presentation predictions.
CONCLUSIONS AND DISCUSSION
The ImmunoGrid simulator is a new generation of systems fitting into the framework of the VPH Initiative [84] enabling both descriptive and predictive modeling of the immune system in humans and in model animals. ImmunoGrid promotes a new, technology-driven, systematic approach to vaccine research. ImmunoGrid is neither a data management nor a laboratory automation framework—it is a distributed system containing databases and computational simulators of immune system and immune processes that enables vaccine development applications. ImmunoGrid enables combination of computational simulations and experimental validation with iterative improvement of in silico screening. It represents a new concept of integrated research environment where virtual experiments are tightly bound with laboratory experiments and clinical observation. The combination of these resources helps improve the economy of vaccine discovery and development through consideration of a large proportion of combinatorial space of vaccine targets—a quest not possible using traditional methods.
Two representative accomplishments show the utility of ImmunoGrid. First, ImmunoGrid is, to our knowledge, the first true Grid application on immune system modeling. Utilization of Grid infrastructure has enabled screening in parallel tens of thousands of viral sequence variants for thousands of HLA variants and peptide lengths that vary from 8–11mers (HLA class I) and 9–15mers (HLA class II). Such large-scale applications are now accessible as on-line resources. Second, the combination of computational modeling and experimentation of immunoprevention vaccine in genetically susceptible mice provided a speed-up of the project and significant savings in search for optimal vaccination schedule. Hundreds of thousands of virtual experiments were used to select tens of key mouse experiments and a rough estimate is that the time and cost of experimentation was reduced by two thirds.
Future developments of the ImmunoGrid will include refinements of the existing models, additional models and simulators and extension to other diseases. Researchers who wish to contribute their models and tools to the ImmunoGrid and those who want to use the research ImmunoGrid should contact the corresponding author or contact person indicated at the ImmunoGrid Web site. Models considered for inclusion in ImmunoGrid must satisfy requirements of validation, concordance with well established experimental or clinical observations, and freely accessible source codes.
ImmunoGrid combines conceptual models of the immune system, models of antigen processing and presentation, system-level models of the immune system, Grid computing and database technology to facilitate vaccine discovery, formulation and optimization.
Molecular level models address the combinatorial complexity of antigen processing and presentation. In combination with Grid computing, such prediction systems enable exhaustive parallel screening of thousands of pathogen proteomes for hundreds of MHC molecules.
System level models enable simulations of millions of experiments. Years of experimentation can be simulated in hours, helping select ‘best’ or ‘most informative’ experiments for actual screening.
FUNDING
The ImmunoGrid project has been funded by the EC contract FP6-2004-IST-4, No. 028069.