The Center for Computational Biology: resources, achievements, and challenges

The Center for Computational Biology (CCB) is a multidisciplinary program where biomedical scientists, engineers, and clinicians work jointly to combine modern mathematical and computational techniques, to perform phenotypic and genotypic studies of biological structure, function, and physiology in health and disease. CCB has developed a computational framework built around the Manifold Atlas, an integrated biomedical computing environment that enables statistical inference on biological manifolds. These manifolds model biological structures, features, shapes, and ﬂows, and support sophisticated morphometric and statistical analyses. The Manifold Atlas includes tools, workﬂows, and services for multimodal population-based modeling and analysis of biological manifolds. The broad spectrum of biomedical topics explored by CCB investigators include the study of normal and pathological brain development, maturation and aging, discovery of associations between neuroimaging and genetic biomarkers, and the modeling, analysis, and visualization of biological shape, form, and size. CCB supports a wide range of short-term and long-term collaborations with outside investigators, which drive the center’s computational developments and focus the validation and dissemination of CCB resources to new areas and scientiﬁc domains.


MISSION
The Center for Computational Biology (CCB), one of the National Centers for Biomedical Computing (NCBCs), is focused on developing and applying tools for 'Computational Atlases', a framework that goes beyond traditional paper or digital atlases by providing computational methods to map bioimaging and related data from multiple subjects into common coordinate systems for group comparisons. The concept of an atlas is naturally adaptable across different kinds of populations, and atlases can reflect multiple modalities of information, including wide ranges of scale and time. Atlases can incorporate complex mathematical models of biological features, statistical methods for analysis and inference on populations, and an increasing spectrum of scientific disciplines. CCB integrates all of these information perspectives using cutting-edge mathematical models, optimized algorithms, and advanced computational infrastructure.
The Computational Atlas must incorporate accurate registration, 1 shape extraction, modeling and analysis, 2 voxel 3 and tensor based morphometry, 4 and spatial-temporal statistics 5 in order to understand the sometimes subtle, distributed, and dynamic changes associated with normal and pathological biological processes of the brain.
A powerful example of the CCB computational atlasing efforts is the development of a 'Manifold Atlas', an integrated biomedical computing environment that combines a workflow framework with new facilities for statistical inference on biological manifolds. These manifolds are basic mathematical models of biological structure, including shapes and flows, which support morphometric and statistical analyses suitable for individual and population comparisons. The manifold atlas enables holistic statistical analyses of shape information, provides an environment for studying associations between biological structure and function in multimodal population studies, and makes it easier to integrate multidisciplinary methods to address complex translational challenges.
A 'manifold' is a mathematical space that, on a sufficiently small scale, resembles Euclidean space. For example, the surface of a brain structure such as the hippocampus is a two-dimensional manifold, while its volume is a three-dimensional (3D) manifold. In neuroimaging, manifolds are used to describe brain structures, 6e9 opening the door to mathematical and computational brain mapping methods for analyzing connectivity, development, and function. For example, using manifolds permits development of new registration methods, such as the Diffeomorphic Neuroanatomical Registration Framework described below. Brain features have been modeled as Riemannian manifoldsdmanifolds that include a metric, thereby providing geometric structure and permitting definition of geodesics (shortest paths) and curvature. CCB uses Riemannian manifolds to represent parametric surfaces 10 and to define flows between them (eg, Ricci flows and Riemannian fluid flows), 11 12 as well as for shape analysis (eg, spectral embedding) and analysis of high-dimensional diffusion imaging datasets. 13 Shape manifolds are also used in biomedical analysis of brain structures. The most basic shapes are 'curves' (such as tractographic and sulcalegyral curves), 'surfaces' (such as the outer boundary of the cerebral cortex), and 'volumes' (such as subcortical structures and cortical regions). Shape manifolds can be augmented into higher-dimensional manifolds with biological data such as tissue density and gene expression, as well as 'flows'dwhich can characterize the evolution of shapes over timedand therefore represent important biomedical patterns such as neurodevelopment, brain activation, and disease progression.
Atlases play fundamental roles in computational biology, both as unified mathematical models and as intuitive computational environments. By its nature, the CCB manifold atlas has a visual representation, which is vital for many types of biological information, and it includes an array of related maps, each of which associates features to points in some coordinate space. Any parameterized set of data may be viewed as a map. An example is a brain map, which associates brain features with 3D or four-dimensional (4D) coordinates. Biological sequence maps are also examples, mapping molecular information with onedimensional linear coordinates. Combining these maps makes it possible to answer queries that cut across scales and modalities. The CCB focus is on computational biology of the brain; the brain's complexity is so great that a common computational framework is vital.

TOOLS
The new CCB 'biomorphometry tools' combine methods from differential geometry, Bayesian theory, and statistics on manifolds (figure 1). The resulting biological inferences permit complex analysis of multimodal information about biological structure. We have also developed methods such as 'manifold learning'. 14e16 Statistical inference on biological manifolds can be used for undertaking a variety of tasks: (1) mathematical definition and representation of biological structures; (2) defining an abstract manifold consisting of such representations, incorporating both differential geometric and manifold-learning methods wherever suitable; (3) defining metrics that measure distances on and between manifolds; (4) constructing biological atlases on manifolds using (1)e(3) above; and finally (5) performing population-based statistical analysis of biological parameters represented on manifolds. The aims of the CCB yield a set of end-to-end analytical workflows (see Pipeline below) that permit fusion of features extracted from structural and diffusion images, followed by analyses that answer families of important biological questions introduced by various driving biological projects (DBPs). The CCB Atlasing Toolkit, a suite of workflow modules for atlasing with biological manifolds, includes Pipeline protocols integrating data services, parallel computation resources, analytical packages, workflow processing, and best practices (such as the protocols embodied in workflows in the Pipeline Library and the CCB Workbench).

DRIVING BIOLOGICAL PROJECTS
The CCB promotes and nurtures collaborations with outside groups using two complementary mechanisms to initiate, manage, and advance collaborative projectsdlong-term DBPs and short-term pilot collaborative projects. In the period 2004e2011, the CCB maintained dozens of DBPs and pilot collaborative projects and supported hundreds of service recipients, outside investigators, and infrastructure users.

DBP summary
Each CCB DBP addresses heterogeneous aspects of computational biology. Their cumulative breadth and diversity supports the Center 's effort on developing the computational manifold atlas.

DBP impact
Collectively, the CCB DBPs have led to over 220 published peerreviewed articles, generated six complementary computational atlases, designed dozens of end-to-end computational analysis protocols, and provided thousands of datasets to the scientific community. Examples of significant CCB DBP findings include the following. 1. We made the first time-lapse films of Alzheimer's pathology spreading in the living brain. Our time-lapse maps show the spread of a new compound (FDDNP-PET) that labels amyloid plaques and neurofibrillary tangles in the living brain. 17 This mapping technique has been hailed as a breakthrough in the Alzheimer's disease community, as has the earlier Figure 1 A schematic of the Center for Computational Biology biomorphometry tools using powerful methods from differential geometry, Bayesian theory, and statistics on manifolds. PCA, Principal Component Analysis.
development of the first time-maps of structural brain change in Alzheimer's disease. 18 This type of dynamic 4D map can show where treatments slow a disease 19 and reveal the disease trajectory as it spreads in the living brain. 2. We developed a novel method (figure 2A) based on fluid mechanics and information theory, to track the location and rate of brain degeneration in an individual. 24 3. We are now validating it in a separately funded large-scale Alzheimer's disease project (ADNI). 25 4. We created the first 3D anatomical brain atlas indexing tests of genetic association of two schizophrenia disease-related DISC1 and TRAX haplotypes with regional cortical gray matter density. 26 5. We investigated genotypeephenotype relationships in schizophrenia (figure 2B) and discovered 22 23 associations between cortical gray matter density, the schizophrenia risk gene DISC1, and alterations in brain structure associated with deletions at the risk locus 22q11.2 ( figure 2C).

ACCOMPLISHMENTS
There are many quantitative and qualitative metrics used to assess the accomplishments of the Center in the past 7 years. Some of these include number of publications, quality of software tools, impact of supported collaborative research projects, caliber of the trainees. Also relevant are the applications of the techniques and models to new domains and problems. Below we include some specific products that resulted from the CCB research and development efforts. Since 2004, CCB investigators have published 812 manuscripts, including peer-reviewed journal articles, books, book chapters, and conference proceedings and abstracts. Of these, a CCB member was first author on 237 papers, with the remainder having been authored by someone outside CCB in collaboration with CCB or in 37 without any direct collaboration at all. They designed and implemented 75 image processing, shape analysis, tensor modeling, informatics, and visualization software tools and web services, which were distributed over 10 000 times. They supported 112 active collaborations and serviced hundreds of researchers, mentored 478 trainees, and conducted dozens of training courses and educational events. The CCB also distributed large amounts of imaging, phenotypic, and genetics data, designed 90 different data analysis Pipeline protocols, and provided a 1200 core computational grid infrastructure to over 600 users (http://CCB.loni.ucla.edu). There were seven collaborative RO1 grants that grew out of CCB research projects and matured to the point of becoming stand-alone research endeavors.

Datasets
The CCB maintains one of the largest neuroimaging archives in the world, with more than 65 different projects, that comprise multiple species, more than 70 000 image volumes, dozens of imaging modalities, and diverse arrays of data on normal and pathological states from thousands of subjects. In addition, meta-data, derived imaging data, and genetics data are available for many subjects and projects (http://ccb.loni.ucla.edu/ resources/ccb-data/).

Mathematical modeling and computational algorithms
CCB has developed a unifying approach for non-linear registration, matching general geometric patterns including landmark points, curves, surfaces, and sub-volumes using implicit level set  20 3D map of brain changes in a dementia patient with posterior cortical atrophy. Percent tissue losses are computed relative to the initial MRI scan of the same patient, revealing disease progression after each 6-month interval. Active right temporal and parietal lobe degeneration is spreading in the brain. Such maps may be used to assess treatment response. (B) Genotype-to-phenotype schizophrenia mapping generated automatically by PubGraph. 21 (C) CCB surface-based cortical thickness maps show regional decreases in 22q11DS. 22 23 methods. A distance function-based, non-linear landmark curvematching algorithm 27 28 with an inverse-consistent elastic energy was introduced to compute deformation fields carrying source landmarks in the form of curves and/or points to homologous landmarks in a target image. This algorithm facilitates non-linear, inverse-consistent, intensity-based registration methods suitable for 3D image volumes 29 30 (figure 3). In addition, we pioneered a method for intrinsic-feature-based shape correspondences 31 and an automated detection algorithm for analysis of sulcal, gyral, and sub-cortical patterns. 32 33 We also designed and implemented two new level-set based techniquesda multilayer and multilevel level setdfor volumetric segmentation of brain imaging data 34 35 and a new algorithm for automatic whole brain segmentation, which was trained and validated on manually segmented data. 36e38

NCBC developments
CCB has actively participated in many NCBC-wide initiatives and computational infrastructure developments. CCB led the design and development of the NCBC Biositemaps (http:// www.Biositemaps.org) and the iTools Resourceome, 39 provided an open-access computational infrastructure for general biomedical computing, participated in many NCBC dissemination and training events, and shared data, tools, and resources via the NCBC framework. Together with the other NCBCs, CCB has organized a number of training events (http://ccb.loni. ucla.edu/training), provided student fellowships, and dissemi-nated valuable digital educational resources, video archives, and research tutorials (http://www.loni.ucla.edu/SVG/).

Pipeline
The CCB Pipeline is a Java-based platform-agnostic graphical workflow environment for design, distributed client-server execution, and validation and community distribution of computational protocols. 40 41 The Pipeline environment enables the sharing and replication of results at multiple institutions and promotes collaborative open science. Figure 4 shows an example of an image registration meta-algorithm implemented completely within the Pipeline environment using heterogeneous types of data, software tools, and services. In addition to computational algorithms, the Pipeline environment also provides access to standardized datasets.

Training and dissemination
The CCB educational and training efforts have involved a wide range of activities including mentoring and supervision of hundreds of undergraduate, graduate, and postgraduate trainees, scientific presentations at national and international conferences, K-12 instructional events and organization of research workshops.

ONGOING CHALLENGES AND FUTURE DEVELOPMENTS
The NCBC program is, by all accounts, a major success. Each center, CCB among them, plans and operates with an Figure 3 Example of using the new registration methods and tools to compute Jacobian deformation maps representing non-rigid deformations and the magnitude of the local morphology. Above, Jacobian maps of a patient with Alzheimer's disease between time 1 and time 2 superimposed on the target volumes. Results show inverse consistency (column 3) and stability of the unbiased approach in the absence of physiological changes. Adapted from Yanovsky et al. 24 Figure 4 Image registration meta-algorithm (IRMA) Pipeline workflow applied to ADNI data. Computational tools that solve one specific problem may be integrated via meta-algorithms using the Pipeline environment (http://Pipeline.loni.ucla.edu/). For example, IRMA 42 provides a robust volumetric registration, which often outperforms the individual methods used 43e45 by this meta-algorithm. expectation of 10 years of funding. Creating a successful center requires this level and duration of support to fully realize its goals as stipulated by the program. Furthermore, these are cooperative agreements, and as such the activities and directions of the centers are strongly influenced and in some instances specifically guided by NIH program staff. The challenge then is how to continue this kind of program within a model that requires traditional peer review, with all its shortcomings. Evaluations delivered by committees with incomplete knowledge of the topical areas for each center is a formula for a random outcome. Grouping all NCBC applications into one or two review panels cannot possibly do justice to the diversity of science represented in this program.
The challenge faced by CCB is its very existence. Future developments will depend on the availability of funding.