Comparative functional genomics for identifying models of human cancer

Genetically modified mice with overexpressed and/or deleted genes have been used extensively to model human cancer. However, it is uncertain as to what extent the mouse models reproduce the corresponding cancers in humans. We have compared the global gene expression patterns in human and mouse hepatocellular carcinomas (HCCs) in an attempt to identify the mouse models that most extensively reproduce the molecular pathways in the human tumors. The comparative analysis of the gene expression patterns in murine and human HCC indicates that certain genetic mouse models closely reproduce the gene expression patterns of HCC in humans, while others do not. Identification of mouse models that reproduce the molecular features of specific human cancers (or subclasses of specific human cancers) promises to accelerate both the understanding of the molecular pathogenesis of cancer and the discovery of therapeutic targets. We propose that this method, comparative functional genomics, could be effect-ively applied to the analysis of mouse models for other human cancers.

Genetically modified mice with overexpressed and/or deleted genes have been used extensively to model human cancer. However, it is uncertain as to what extent the mouse models reproduce the corresponding cancers in humans. We have compared the global gene expression patterns in human and mouse hepatocellular carcinomas (HCCs) in an attempt to identify the mouse models that most extensively reproduce the molecular pathways in the human tumors. The comparative analysis of the gene expression patterns in murine and human HCC indicates that certain genetic mouse models closely reproduce the gene expression patterns of HCC in humans, while others do not. Identification of mouse models that reproduce the molecular features of specific human cancers (or subclasses of specific human cancers) promises to accelerate both the understanding of the molecular pathogenesis of cancer and the discovery of therapeutic targets. We propose that this method, comparative functional genomics, could be effectively applied to the analysis of mouse models for other human cancers.

Introduction
Cancer develops via a complex multistage cellular and molecular process. Each stage includes genetic and/or epigenetic events that progressively transform normal cells into highly malignant derivatives. Several regulatory pathways are sequentially or simultaneously altered during tumor progression, resulting in a growth autonomy, insensitivity to growth inhibition and apoptosis, augmented neoangiogenesis, invasion and metastatic dissemination (1). Although many approaches, including genome-scale studies, provide insights into some of the stages in human tumorigenesis, a sequential analysis of the development of tumors in humans is difficult. Experimental animal models, in particular mouse models, of carcinogenesis have permitted the examination of the stages of neoplastic development in considerable detail. However, it is uncertain as to what extent the mouse models reproduce the molecular features that characterize human cancers. We have hypothesized that, if the regulatory elements of evolutionarily related species are conserved, the gene expression signatures reflecting similar phenotypes in the species would also be conserved (2). To test this hypothesis, we investigated whether a comparison of the global expression patterns of orthologous genes in human and mouse hepatocellular carcinomas (HCCs), an approach we term 'comparative functional genomics', would identify similar and dissimilar tumor phenotypes (2). Here, we discuss both the challenges and the potential value of applying comparative functional genomics, to assess the accuracy of mouse models. Based on earlier results with HCC, we propose that comparative functional genomics could be used to assess the accuracy of mouse models for cancers in other tissues.

Challenges in using mice to model cancer in humans
Technological advances during the last two decades have facilitated the generation of genetically engineered mice that mimic the genetic alterations observed frequently in human cancers. By either selectively driving the overexpression or deletion (knocking out) of the genes of interest in mice, the development of tumors in various tissues can be induced, and their similarity to the human tumor, tested (3,4). Indeed, many mouse models of human cancers have been generated based on the notion that the same genetic alterations will lead to similar biological outcomes in both species. Although the information obtained from these genetically altered mouse models has provided answers to many critical questions related to the mechanisms of tumor progression (5,6), this approach has not always worked. This is well illustrated by the absence of the development of retinoblastoma when one or both alleles of the Rb gene are mutated or deleted in mice. Loss of Rb function alone is not sufficient to produce retinoblastoma in mice (7--10), which requires the additional loss of the Rb-related tumor suppressor, p107 (11). In contrast, inactivating mutations in both the alleles of RB are necessary for the development of retinoblastoma in humans (12). Although the mouse model of retinoblastoma is not a precise molecular model of the human tumor, it has provided a better insight into the multiplicity of the molecular mechanisms of tumor development. In addition to other examples, the retinoblastoma model clearly demonstrates the challenges that must be faced in generating and selecting accurate mouse models of specific human cancers due to subtle differences in the molecular regulatory circuitry in the two species. Other major challenges to the wide use of mouse models for the expression of specific human cancers include the fact that all types of human cancer are not yet reproducible in mice. It is, however, reasonable to expect that this problem will be solved in the near future with an improved understanding of the biology of cancers and technological advances in the capacity to control the gene expression in cells of specific organs and tissues.
Although certain mouse tumor models reproduce similar cellular alterations and histopathological patterns as the human counterparts, it is uncertain whether these models recapitulate the molecular regulatory pathways that are altered in the human tumors. Furthermore, it is evident that all human cancers are heterogeneous with respect to natural history, molecular pathogenesis and response to treatment. The issue of tumor heterogeneity introduces a further complexity in the identification of the 'best-fit' mouse models for human cancers. There is, however, increasing evidence that distinct genomic alterations in tumors, reflected in the gene expression patterns, correlate with a different clinical outcome in patients with cancer. Studies of the gene expression profiles of diffuse large B cell lymphoma in humans provide one of the best examples of how molecularly distinct subgroups of cancer display different clinical behaviors (13,14). Somewhat similar observations have been recently made on mammary, prostate and lung cancers (15--19), and the number of clinically variant subgroups of different cancers that are classified as a single type based on morphology alone will undoubtedly expand with a further analysis of the global gene expression patterns. The identification of new subgroups of human cancers introduces additional challenges to the application of mouse models to analyze aspects of human cancers that cannot be studied readily in humans. It seems likely that as more gene expression patterns of specific mouse tumors are analyzed, many will also show a heterogeneity of molecular development that is not reflected by the morphology of the tumors. This dilemma may be addressed by directly comparing the global gene expression patterns of mouse and human tumors using the 'comparative functional genomics' approach.

Mouse models of hepatocarcinogenesis
HCC is the fifth most common cancer in the world, accounting for an estimated half-million deaths annually (20). While HCC is a major cancer in Southeast Asia and sub-Sahara Africa, the incidence of this tumor is much lower in Western Europe and USA. However, the incidence and mortality rate of HCC have doubled in USA over the past 25 years and this upward trend is expected to continue over the coming decades (21,22). Although the agents that cause most of the HCC in humans are known (i.e. HBV and HCV infections, aflatoxin exposure, alcoholism, non-alcoholic steatohepatitis, etc.), and much is known about the cellular and molecular alterations in human HCC (23), the molecular pathogenesis of HCC is still not well understood (24--27).
HCC is also a common tumor in mice, and the morphologic expression of HCC in mice resembles that in humans, although the causative factors differ (28). Analysis of the gene expression patterns of chemically induced HCC in animal models has not yet elucidated the molecular pathogenesis of this tumor, although these studies suggest that diverse molecular pathways may lead to common patterns of morphologic features (29). In addition to chemically induced liver tumors, the development of HCC has also been produced in genetically engineered mice, generated to decipher the molecular pathways that lead to this tumor (Table I).
Specific overexpression of genes such as Myc, Tgfa, E2f1, Ccnd1 and Hras1(G12V) in the liver induces the development of HCC in mice (30--35). An induction of the development of HCC by the expression of viral proteins in mouse liver is also documented (35--43). Gene knockout models have been generated in mice to study the participation of selected tumor suppressor genes in hepatocarcinogenesis (44--46). Chemical induction of HCC in mice with diethylnitrosamine (DENA) is a widely used model to investigate the multiple aspects of hepatocarcinogenesis (47), and provides a useful base to compare HCC models induced by exogenous agents to genetically engineered models. Since HCC develops in each of these models, it is reasonable to presume that all the models could be useful for elucidating the molecular pathogenesis of HCC, including aspects of the regulation of HCC development in humans. However, it is also evident that individual mouse models cannot be equally useful in identifying the molecular pathways involved in all variants of human HCC, due to the species differences in liver physiology and the divergent impact of cancer-associated genes in the two species (48). Therefore, the challenge is to identify the mouse models that most closely reproduce the different molecular subtypes of human HCC.

Comparative genomics and comparative functional genomics
Comparative studies of the genomic structure of evolutionarily related and unrelated species are useful for predicting gene structures and the number of active genes in the species (49). Such comparative studies have also identified cis-acting regulatory elements that regulate the gene expression (50). Availability of genome sequences from many species has notably advanced comparative genomics, in particular, and the understanding of evolutionary biology, in general. The neutral theory of molecular evolution, regardless of the controversy it has engendered among evolution theorists, provides a framework for the identification of functional DNA sequences in genomes of different species (51,52). The central hypothesis of the neutral theory of molecular evolution is that the vast majority of mutations in DNA sequences are neutral with respect to the fitness of an organism. While deleterious mutations are rapidly removed by selection, neutral mutations persist and follow a stochastic process of genetic drift through a population. Therefore, non-neutral DNA sequences (functional DNA sequences) must be conserved during evolution, whereas neutral mutations accumulate. This difference has allowed the identification of both the protein coding sequences and the functional noncoding sequences in a genome (49,50,53). We argue that if the regulatory elements of evolutionarily related species are conserved, it is reasonable to hypothesize that the gene expression signatures that represent similar phenotypes in different species could also be conserved. This hypothesis forms the basis of our attempt to identify aberrant phenotypes reflecting molecular pathways that are conserved during the development of cancer in mice and humans.
We have recently investigated the possibility that the comparison of gene expression patterns of HCC from mice and humans would permit the direct identification of common aberrant molecular pathways involved in hepatocarcinogenesis. The involvement of common molecular pathways in subsets of HCC in both species could potentially allow the identification of mouse models that provide a 'best-fit' for the subclasses of human HCC that we found in a previous analysis of global gene expression (54). Gene expression patterns of mouse HCC were obtained from seven HCC mouse models; two chemically induced (ciprofibrate and DENA), four transgenic (targeted overexpression of Myc, E2f1, Myc/ E2f1 and Myc/Tgfa in the liver), and one knockout (Acox1 À/À ). Gene expression patterns of human HCC from 91 patients were analyzed in an independent study that uncovered two distinctive subgroups of HCC that were related to patient J.-S.Lee, J.W.Grisham and S.S.Thorgeirsson survival, and this association was validated by independent supervised methods (54). Orthologous human and mouse genes from both data sets were selected, and the gene expression data were integrated after standardizing the relative expression levels for both species (Figure 1). In a hierarchical clustering analysis of integrated data, the gene expression patterns of HCC from Myc, E2f1 and Myc/E2f1 mice had the highest similarity with those of the better survival group of human HCC, while the expression patterns of Myc/Tgfa and DENA-induced mouse HCC were most similar to those of the poorer survival group of human HCC (Figure 2). These results suggest that these two classes of mouse models might more closely recapitulate the molecular patterns of the two subclasses of human HCC. In contrast, the gene expression patterns of HCC that develop in Acox1 knockout and ciprofibrate-treated mice were least similar to those observed in either subclass of human HCC. The development of HCC in these two models is driven by a peroxisome proliferation in the liver (44,54--56). These results suggest that the process of hepatocarcinogenesis induced by the peroxisome proliferation in mice progresses along molecular pathways that do not occur frequently in humans, supporting previous studies, which suggests that humans are insensitive to the hepatotoxic effects of peroxisome proliferation (57).

Conclusions and perspectives
Although transgenic and knockout mouse models have improved our insight into the mechanistic aspects on human cancers, in most instances, we still rely on the observed phenotypes with limited sets of genotypes to correlate the development of tumors in the two species. Aligning orthologous genomic sequences from different species has allowed the identification of evolutionarily conserved sequences that might be functional regulatory elements (49,50,53). Therefore, it is reasonable to postulate that similar gene expression signatures exemplify phenotypes that are conserved in different species. Our work has demonstrated that the gene expression signatures reflecting similar phenotypes are indeed conserved in mice and humans during the development of HCC, and that interspecies comparison of the global gene expression patterns Knockout models Acox1 À/À 129/Ola, C57BL/6 10--15 (44) Lkb1 þ/À 129/SvJ 10--12 (46) Mdr2(Abcb2) À/À 129/OlaHsd 18 (45) WHV, woodchuck hepatitis virus; AAT, a-1-antitrypsin; MUP, major urinary protein; LFABP, liver fatty acid binding protein; HCA, hepatocellular adenoma; MT, metallothionein; AT3, antithrombin 3.
Comparative functional genomics in mouse models could identify mouse models that best mimic the human HCC. Application of this approach, 'comparative functional genomics', may similarly identify the most relevant mouse models for cancer of other tissues.
However, there are critical issues that must be considered before applying this approach. The fact that the basal levels of gene expression differ between two species can complicate the analysis and interpretation of the integrated gene expression  Fig. 1. Comparative functional genomics. Gene expression data of mouse (seven models) and human HCC tissues (two subclasses) were collected independently using spotted dual-channel microarray platforms. In both series of experiments, normal livers were used as a reference to measure the gene expression ratios. Before the integration of the two independent data sets, orthologous genes present in both microarrays were selected for a further analysis. After an initial data reduction to focus on genes whose expression varied non trivially across the samples, the gene expression ratios were then standardized to a mean of zero and a standard deviation of one separately in each data set. By applying unsupervised (hierarchical clustering) and supervised analysis (prediction models) of the gene expression patterns, mouse models that best or least mimic human conditions could be identified. The 'best fit' mouse models can then be used to test the hypotheses on tumor progression that are generated by the analysis of cross-species gene expression patterns or from other experimental data. These models will also be extremely valuable for testing both potential therapeutic targets identified in human study and preclinical trials of drugs. data. Use of a normal counterpart (or the specific cell types involved) of the cancerous organ as a reference to measure the gene expression ratios could lower the risk of a false interpretation of the analysis. Furthermore, the standardization of data sets requires large numbers of human samples and multiple mouse models. Integration of data from a few human tumors and one or two mouse models will not provide a sufficient discriminatory power to identify the most relevant mouse models.

Supervised and unsupervised analysis of data
The potential advantage to be realized from a comparative functional genomics analysis is the capability of selecting the animal models that best express the molecular features of Comparative functional genomics in mouse models human cancers. Discovery of subclasses of human tumors, such as HCC, with homogenous patterns of gene expression that correlate with survival, suggests that appropriate animal models of these tumors, with similar patterns of gene expression, can be identified for each subclass. Indeed, we have identified mouse models of HCC that display patterns of gene expression similar to both the subclasses of human HCC (2). Identification of appropriate mouse models of human cancers by comparative functional genomics should accelerate the discovery of relevant therapeutic targets and the testing of potential therapeutic agents. Establishing a molecular relationship between mouse and human cancers may also provide the basis to investigate the early genomic and molecular events in neoplastic development. We have analyzed the gene expression patterns only in the final stages of tumor (HCC) development and classified them in terms of survival and functional categories of genes. Although these aberrantly expressed genes are essential molecular elements in the malregulation of cellular physiology that produces the cancer phenotype, these results do not address the specific genetic aberrations that initiate the affected cell(s) on the molecular pathway to cancer, or the subsequent genomic and molecular changes that enable the cells to progress toward the final malignant phenotype. Thus, future genomic studies with early, intermediate and late precancerous cellular lesions will be necessary to identify the specific molecular pathways that are involved and the molecular components of the pathways that are aberrant. Using the conserved gene expression patterns in human and animal models as a 'foundation', future studies should identify the variety of aberrantly expressed genes, abnormal molecular species and the pathways that lead to this outcome. Because of the difficulties in acquiring the preneoplastic lesions from human patients, these studies will need to be conducted simultaneously with tissues from animal models. Only when this is accomplished will we be able to understand the molecular pathogenesis of cancer and how closely mice (and other experimental animals) model cancer in humans.