Gene expression profiling of preneoplastic liver disease and liver cancer: a new era for improved early detection and treatment of these deadly diseases?

Hepatocellular carcinoma (HCC) is a multi-step process associated with changes in gene expression. Currently, several technologies enable global gene expression profiling. The number of studies to probe global gene expression profiles of HCC or preneoplastic chronic liver diseases has increased exponentially in recent years. These studies have quickly provided rich information and some additional clues to the genesis of liver cancer. The application of gene expression profiling to preneoplastic liver diseases and HCC is growing in importance and practicality. In this commentary, we review the recent advances in the utilization of global gene expression profiling to liver cancer, which have provided new insight into the molecular mechanisms underlying the development of HCC. We have also discussed the problems related to these new technologies, as well as their contributions and implications. By recognizing the shortcomings, we can reassess our current approaches, which allow us to better design and analyze global gene expression-based experiments. These new approaches will undoubtedly contribute to a better understanding of hepatocarcinogenesis.

Hepatocellular carcinoma (HCC) is a multi-step process associated with changes in gene expression. Currently, several technologies enable global gene expression profiling. The number of studies to probe global gene expression profiles of HCC or preneoplastic chronic liver diseases has increased exponentially in recent years. These studies have quickly provided rich information and some additional clues to the genesis of liver cancer. The application of gene expression profiling to preneoplastic liver diseases and HCC is growing in importance and practicality. In this commentary, we review the recent advances in the utilization of global gene expression profiling to liver cancer, which have provided new insight into the molecular mechanisms underlying the development of HCC. We have also discussed the problems related to these new technologies, as well as their contributions and implications. By recognizing the shortcomings, we can reassess our current approaches, which allow us to better design and analyze global gene expression-based experiments. These new approaches will undoubtedly contribute to a better understanding of hepatocarcinogenesis.
Liver cancer is the third most deadly cancer worldwide (estimated 548 554 deaths in 2000), and the fifth in the number of cases (1). Hepatocellular carcinoma (HCC) is one of the most common malignant liver tumors in the world with a high prevalence in Asia and sub-Saharan Africa (2). Recent studies have shown that the incidence of HCC has substantially increased in the USA as well as in other areas including Japan and Europe (3)(4)(5). HCC is one of the few human cancers in which an underlying etiology can often be identified in most cases. It is frequently associated with chronic liver diseases including cirrhosis (6). Most of the etiological factors associated with HCC, such as chronic viral hepatitis, alcohol abuse, metabolic disorders or other environmental agents, also lead to cirrhosis (7). However, it is not clear whether these factors induce HCC directly or whether they act indirectly by producing chronic liver injury and cirrhosis.
The molecular mechanisms of hepatocarcinogenesis are not well understood, although aneuploidy, and multiple genetic alterations are often present. For example, mutations of p53, Rb and β-catenin have been reported in HCC, while c-myc and cyclin D1 are frequently overexpressed (8)(9)(10)(11)(12). Loss of heterozygosity (LOH) may be associated with the inactivation of tumor suppressor genes. LOH at 1p, 4q, 6q, 8p, 13q, 16q and 17p loci are frequently reported in human HCC (13)(14)(15)(16). In addition, several growth factors have been implicated in the development of HCC, including transforming growth factor α and β (17,18). Unfortunately, these fragmented findings do not piece together a clear portrait of HCC. This disease has been referred to as very heterogeneous (19).
With the recent development of the gene expression analysis methodologies, gene expression on a global scale can be determined in tissue samples and cell lines. The use of gene expression profiling is particularly important in cancer. This is because the initiation and malignant progression of cancers result from the accumulation and combining effects of many changes in the sequence or expression level of cancer initiation genes.
In this paper, we review recent progress on gene expression profiling of HCC and discuss problems and contributions as well as their implications.

Technologies to analyze global gene expression
Several techniques are now available to monitor gene expression on a genomic scale. Differential display (DD) is the most common technique used in the last 10 years to examine differences in gene expression between two samples (20). This technique identifies differentially expressed mRNAs based on their ability to hybridize with specific primer sequences. In DD, multiple biological samples are amplified in parallel, resolved by electrophoresis and visualized. Several investigators have used the DD technique to analyze hepatocarcinogenesis (21)(22)(23)(24).
Suppression subtractive hybridization (SSH) is a recently developed technology designed for identifying differentially expressed genes (25). One critical feature of this technique is the combination of suppression polymerase chain reaction (PCR) with subtraction. To optimize the subtraction, the cDNA is digested with a restriction enzyme to generate small DNA fragments (~500 nucleotides in length). Specially designed DNA adapters are ligated onto one of the cDNA pools, which are followed by two rounds of hybridization and PCR. Miyasaka et al. used this technique to determine changes in gene expression profiles in HCC (26).
Representational difference analysis (RDA) is a variant of differential hybridization that incorporates a PCR amplification step (27). The procedure is based on the generation of two complex cDNA populations that are cut, ligated to suitable adapters and subsequently amplified by PCR. The products represent the RNA populations at reduced complexity (therefore, called representations) and are the starting material for the comparison. Differences are accumulated by a combined process of subtraction and kinetic enrichment. After repeated cycles of RDA, the differentially expressed products are cloned and sequenced. This method was used to identify CRG-L1 (musculus cancer related gene-liver 1) in a mouse model of HCC (28).
The DD, SSH and RDA methods provide some sensitivity in identifying novel genes that may be associated with hepatocarcinogenesis. However, these methods are not quantitative and are labor-intensive. The developments of Serial Analysis of Gene Expression (SAGE) and cDNA microarray techniques, for the first time have allowed the quantitative measurement of gene expression on a global scale. SAGE was first described by Velculescu et al. in 1995 (29). In SAGE, a short unique sequence tag is isolated from each gene by a PCR-based strategy. Concatemerized tags are sequenced, and the abundance of these tags provides a measure of the level of gene expression in the starting material. These SAGE tags can be linked to a specific transcript designation in an appropriate database of unique transcripts, such as UniGene. SAGE is essentially an accelerated technique for cDNA library sequencing. Because it requires intensive sequencing, SAGE is not well suited for the analysis of large numbers of samples. However, because SAGE does not require a prior knowledge of the pattern of gene expression in a given mRNA source, the same biochemical and bioinformatics procedure can be applied to any sample, given the availability of the appropriate reference database of SAGE tags.
In 1995, Brown et al. at Stanford University published the first paper on cDNA microarrays, describing them as a highcapacity system developed to monitor RNA levels of numerous genes simultaneously, using two-color fluorescence (30). cDNA microarrays consist of thousands of different cDNA clones spotted in an array manner on a glass slide or a nylon membrane. These slides or membranes can be hybridized with two cDNA probes made from two different samples, each fluorescently labeled with spectrally distinct dyes. The ratio of fluorescent signal intensity at any location represents the ratio of corresponding mRNA molecules in the two samples hybridized to the array. In contrast to other techniques, microarray can only measure the expression of genes that correspond to sequences included in the array fabrication process. The advantage of cDNA microarrays is that it does not require a prior knowledge of a cDNA sequence because clones can be used and sequenced later if they prove of interest.
Instead of using one cDNA PCR product to capture targets representing a particular gene of interest, multiple oligonucleotide probes have been designed from various regions along the 3Ј end of the gene (31). One advantage of using oligonucleotides for microarray probes is their increased specificity when compared with cDNA clones. Because oligonucleotide arrays are designed and synthesized based on sequence information, physical intermediates such as cloning and PCR are not required. Specific sequences, which are non-overlapping, can be designed to increase the hybridization sensitivity, even with shorter sequences. Furthermore, oligonucleotides can distinguish between single nucleotide polymorphisms and splice variants. Current cDNA microarrays contain up to 32 000 transcripts, and oligonucleotide arrays can contain up to 49 000 unique sequences.

Examples of molecular profiling of HCC
Hepatocarcinogenesis is a long-term process (19). During the long preneoplastic stage leading to HCC, alterations in gene expression are almost entirely quantitative. Thus, gene expression profiling of preneoplastic stage liver will be indispensable 364 to understanding the molecular mechanism of HCC. Several groups have analyzed global gene expression in preneoplastic liver diseases (32,33). Honda et al. compared the gene expression profiles of chronic hepatitis B and C patients by cDNA microarray (32). SAGE has also been applied to liver disease and HCC (33,34). Because hepatitis B virus (HBV) is a major risk factor contributing to the development of HCC, several groups have compared gene expression profiles in cultured liver cells expressing HBV or the HBV-encoded oncogene, HBx (34)(35)(36). We have performed both microarray and SAGE analyses on freshly isolated human primary hepatocytes transiently expressing HBx (34,35). This system served as an ideal experimental control as the relatively low abundance of HBx transcripts (0.14%) in the HBx-infected hepatocyte library is very similar to the abundance of HBx (0.11%) from the HBVpositive HCC library (T.Yamashita, personal communication). Using SAGE, we have identified 31 novel transcripts that are potential targets for HBx. These studies also allow us to generate the hypothesis that HBx may function as a major regulator in common cellular pathways that, in turn, regulate protein synthesis, gene transcription and protein degradation.
Molecular profiling has been successfully used to identify candidate genes for HCC in human and animal model systems.
We have listed selected references to these studies (Table I).
Several groups have used cDNA microarray to identify genes that regulate the composition of the extracellular matrix and the cytoskeleton. Matrix metalloproteinase 14 (MMP14) and osteonectin (SPARC) are up-regulated in HCC, which is consistent with previous findings (37)(38)(39)(40). MMPs play a crucial role in tumor invasion by degrading basement membranes and the stromal extracellular matrix. Osteonectin is a glycoprotein involved in extracellular matrix remodeling. Large amounts of osteonectin mRNA and protein have been detected in the tumor capsule, in the fibrous bands and along capillaries within HCCs.
Rho B (ARHB) is down-regulated in HCC, but another Ras homolog gene family, member A (ARHA) shows an opposite trend (36)(37)(38)41). These results are consistent with another recent result showing that the Ras/Erk and the Ras/RhoA pathways negatively regulate cytokine-induced NOS2 in the normal human liver cell line, while RhoB enhances it (42). This is the first demonstration that genes promoting malignant transformation such as Ras and RhoA inhibit NOS2 induction, while genes with tumor suppressor activity such as RhoB enhance the process. Also Rho E, another Ras homolog gene family member E (ARHE), is down-regulated as several studies have reported (37,41,43).
Major histocompatibility complex class I C (HLA-C) is upregulated in HCC. This result has been confirmed through two separate reports using different techniques (33,44). Class I genes belong to a large multigene family within the major histocompatibility complex (MHC) including HLA-A, HLA-B and HLA-C. There are several reports about MHC and HCC (45,46). Results of these studies have shown that HLA is upregulated in both viral hepatitis and HCC.
Dyneins are molecular motors that translocate along microtubules. The expression of cytoplasmic dynein light chain 1 (PIN, ddlc1) gene is enhanced in HCC, and in chronic hepatitis B or C liver (32,44). An anti-apoptotic effect of this gene has been demonstrated recently in a study of Drosophila (47). Furthermore, its interaction with IkappaB (IkB) has been reported, and its involvement in nuclear factor-B-dependent gene regulation has been suggested (48 may be involved, not only in the cell morphology of HCC, but also in carcinogenesis as a result of its anti-apoptotic effect. SSH analysis of HCC has identified decorin as being downregulated in HCC (26). Decorin (DCN) is a small proteoglycan, and is known to bind to and inhibit transforming growth factor-β (TGF-β). It may directly interfere with the cell cycle via the induction of cyclin-dependent protein kinase, p21 (49). It has been reported that DCN suppresses tumorigenicity when expressed in colon cancer cells (50). These results suggest that DCN might have an inhibitory effect on HCC tumorigenesis or progression.
Several insulin-like growth factor binding proteins are differently expressed in HCC and non-cancer samples (38,41,43,53). IGFBP-3 was reported to be a growth suppressor in various pathways. In the IGF receptor-dependent pathway, IGFBP-3 mediates a wide variety of growth suppression signals such as TGF-β, retinoic acid, TNF-α and p53, among others. Reduced expression of IGFBP-3 has been reported in HCC. Also, other family members, i.e. IGFBP-1 and 4 have been shown to be 365 differentially expressed in HCC when compared with normal liver (32,38).

Potential pitfalls and ways to improve expression-profiling techniques
Problems with sample choice and processing Gene expression profiling of tissues is probably dependent on several factors including individual variation and sample variation. The ethnicity, sex, age and genetic background of a given patient probably affect gene expression. Potential errors caused by those biological variations can be minimized only by increasing the number of subjects included in an analysis.
The selection of an appropriate control as a reference is an important element for microarray study. The reference serves to control variations in the size of corresponding spots on different arrays and variations in sample distribution over the slide (54). In most liver cancer studies, the basic design of gene expression profiling is to compare normal with cancer tissues. The debatable issue is whether adjacent non-HCC liver tissue samples from the same patient should be used when comparing it with HCC or whether a common reference at oxford university press on November 23, 2016 http://carcin.oxfordjournals.org/ Downloaded from Downloaded from https://academic.oup.com/carcin/article-abstract/24/3/363/2608403 by guest on 28 July 2018 derived from a cell line pool or pooled normal liver tissues from healthy donors. Use of non-cancerous tissues from the same patient can provide a clue for individual variation due to the difference in genetic background. However, tissues adjacent to an area of HCC may not be normal, despite the absence of gross histopathological evidence indicative of cancerous changes in apparently normal tissues. Adjacent tissues near a cancer could be genetically altered or exhibit an altered gene expression profile. There may be a significant difference in the conclusions reached by two similar expression-profiling studies when different references are used.
Liver cancer may be a heterogeneous mixture of different cell types. This heterogeneity can complicate the interpretation of gene expression studies. Thus, sample selection is an important issue. The recently developed technique of laser capture microdissection can be used to isolate a defined cell population from specific areas of tissues under direct microscopic visualization (55). Although this can be a timeconsuming process, several groups have been able to apply it to microarray analysis (56)(57)(58). For example, Okabe et al. has used this technique for cDNA microarray analysis of HCC (53). However, laser capture microdissection generates a very small amount of RNA, which then needs to be amplified to attain the sensitivity required by microarray. Two methods have been developed to accommodate this need. One approach uses an RT-PCR based method that can be performed on single cells, but suffers from the observation that cDNA abundance does not correlate with original mRNA levels. The other uses a linear amplification method based on cDNA synthesis by reverse transcription from primers that are tagged with a bacteriophage RNA polymerase promoter sequence (57). This approach has been reported to be reproducible and allows the use of as little as 1-50 ng of total RNA. However, whether amplified RNA samples produced by these techniques are a true representation of the original RNA population still remains to be determined.
We have performed microarray analysis using multiple samples from the same chronic hepatitis B patient (manuscript in preparation). These samples had different histological features under the microscope. Hierarchical clustering analysis of global gene expression was used to compare these samples with other samples from different patients. The samples from the same patient always clustered together and were well separated from other samples of different patients. These results indicate that sample variation is not as significant a problem as variation between individuals, and suggest that gross-dissected liver samples without laser capture microdissection are sufficient for microarray analysis.

Statistical methods
Data management is an essential element in global gene expression profiling. To date, various approaches have been developed for the analysis and exploration of gene expression data, particularly microarray data. However, investigators are confronted with the problem of deciding which expression ratios to regard as significant because there are no standard criteria for the selection of differentially expressed genes. Most investigators typically apply an arbitrary global threshold of a 2-4-fold change for differences in expression that might be considered biologically interesting or significant. While it seems reasonable to assume that the largest changes in mRNA levels are the ones most likely to be biologically significant, there are problems with this approach. Two-fold differences 366 in the ratio of some mRNAs could have more biological impact than 3-fold differences in the expression of other mRNAs. Fold threshold selection does not have any statistical validation (59). For example, in a 10 000 gene array, one would expect to have over 500 non-specific genes that appear differentially expressed by chance at the 95% significant level, regardless of the correlation patterns of the genes (54). Therefore, it is essential to select genes with some statistical confidence to avoid non-specificity.
Several statistical algorithms have been developed, which allow some confidence in assigning significance to candidate genes. Statistical analysis of gene expression data has centered on two approaches, unsupervised and supervised algorithms. Unsupervised methods require no prior knowledge on the sample classification. They are geared towards the discovery of patterns in the data unbiased by outside knowledge. One of the most frequently used unsupervised analysis methods is the hierarchical clustering developed by Eisen et al. (60). However, supervised algorithms require the conditions to be associated with labels that provide information about a pre-existing classification. This information comes from outside the gene expression experiment and might include knowledge of disease subtype or tissue origin of a cell type. Supervised algorithms can be used to identify small changes among the groups that cannot be distinguished by unsupervised methods.
One of the goals of supervised expression data analysis is to construct classifiers, such as linear discriminant analysis, support vector machines, artificial neural networks or K-nearest neighbors (61)(62)(63)(64). To date, supervised analysis algorithms have not been used for HCC cases. Recently, we have compared gene expression profiles between metastatic HCC liver samples and non-metastatic HCC by supervised algorithms (65). We used cDNA microarray to investigate the gene expression profiles of primary and metastatic tumors from surgical specimens of 40 HCC patients. Gene expression profiles were obtained by comparing primary or metastatic HCC with its corresponding adjacent non-tumorous tissues. While unsupervised methods did not identify any difference, the supervised compound covariate predictor (CCP) analysis was used to classify primary and metastatic samples, and to identify genes that discriminate between these two groups (66). CCP analysis correctly grouped 19 of 20 primary HCC samples with metastatic lesions and nine of 13 primary HCC without metastasis. The cross-validated misclassification rate was significantly lower than expected by chance (P ϭ 0.002). In contrast, CCP analysis did not lead to a statistically significant difference between primary and metastatic HCC sample pairs (P ϭ 0.328). These results indicate that the alterations of metastasisassociated genes may occur in primary lesions with metastatic potential. This analysis may allow us to build a training module useful for the diagnosis and prognostic assessment of advanced HCC patients and to identify metastasis-associated candidate genes that may be used as potential targets for molecular therapy. These gene expression based classifiers offer a potential for adjuvant therapy.

Early detection markers and diagnosis
Most HCC patients currently identified in clinics are often at an advanced stage of the disease with a very poor prognostic outcome. Therefore, early detection, using a tumor marker, is essential to reduce the fatality of this disease. An ideal tumor marker should be specific for that particular type of cancer, and produced only by it and not by any nonmalignant condition. It should be absent or present in low amounts in normal individuals, but become elevated in the presence of a very small tumor. The biological half-life should be short, and the quantity of the marker should correlate with tumor load so that it can be used in staging the disease and monitoring an individual's response to treatment. It should also be easily and rapidly measurable by available laboratory methods. While alpha-fetoprotein is the best marker presently available and remains the most widely used, it fulfils only some of these criteria.
The use of gene expression profiling for cancer diagnosis has been demonstrated in acute leukemia samples by oligonucleotide microarray analysis technique (67). Using unsupervised learning, leukemia samples were neatly clustered into the two known subsets solely on the basis of gene expression. In addition, using supervised learning, gene sets differentially expressed in the two subsets were used to correctly classify a group of known samples into the correct categories. This study provided strong evidence that tumor expression profiles can be used for cancer classification. Thus far, this application has not been used for HCC. Recently, we have performed microarray analysis on several chronic liver diseases, and have applied supervised statistical algorithms, including leave-one-out crossvalidated CCP and k-nearest neighbor classifications. These approaches have allowed us to successfully classify these preneoplastic liver diseases according to their risk for developing HCC (68). The development of such molecular signatures may provide a platform that would allow an early diagnosis of HCC patients, and the genes in this classifier may help us obtain a better understanding of the initiation of HCC.

Outcome prediction and drug development
Another promising application of global gene expression profiles in oncology is in the development of new anticancer treatments. One obvious application relies on the identification of new genes and pathways involved in cancer progression that could become therapeutic targets. Large-scale analysis of the changes in gene expression, induced by treatments or by candidate drugs, may reveal the participation and behavior of both known and previously unsuspected genes. Such studies will allow the identification of new molecular targets downstream of the primary one, and will also help to predict the potential toxicity of treatments. Using cDNA arrays, Huang et al. have monitored the differences in gene expression between untreated melanoma cells and the same cells treated with a combination of recombinant interferon and the antileukemic compound (69). They have identified the expression changes that correlate with and potentially control these processes.
Molecular characterization of cancers by the global gene expression profile approach may allow for the earlier detection of cancers at a stage when they may be responsive to treatment. It may also be used to identify high-risk individuals or screen relatives in families with inherited cancers. Gene expressionprofiling techniques could distinguish neoplastic from benign tumors when histological methods are inconclusive. It may add prognostic information to traditional methods of patient staging to improve the prediction of clinical outcome. It may also be able to predict the needed response to treatment so that treatment approaches can be individualized. In addition, it may lead to the development of novel treatment strategies.

367
In general, HCC is considered to be a fatal disease because of its poor prognosis, with the exception of a few patients who have received liver transplantation. This is largely because of the lack of a method for early diagnosis, and the lack of information on the phenotypic changes associated with the development of HCC. Changes in gene expression profiles during the genesis of HCC are also largely unknown. The comparison of global gene expression patterns between normal liver and tumor samples may allow us to understand better the molecular mechanism of hepatocarcinogenesis.

Conclusions
Global gene expression profiling, for the first time, will assess molecular changes associated with liver cancer on a global scale. The newly developed state-of-the-art technologies such as cDNA microarray and SAGE, demonstrate a promising future for the study of liver cancer. However, at present, there is a lack of comparability among these studies, which still precludes a reasonable interpretation for a consensus change that allows us to depict a true portrait of HCC on a global scale. This is largely from, not only the lack of standardization, but also the lack of a good biostatistical approach and the use of different techniques. A clear objective for the design of global gene expression profiling, in combination with good statistical tools, can offer a solid foundation that will yield a meaningful outcome. The two emerging focuses should be considered in the field of liver cancer study. The first need is to develop a gene-expression-based model that allows for identifying individuals with a potential for developing HCC. One way is to compare gene expression profiles between the high-risk and rare-risk groups of chronic liver disease patients with cirrhosis, as well as HCC patients, and to identify a set of significant genes that can be used to classify high-risk individuals with a potential for developing HCC. This approach may offer an early diagnosis for HCC. The second need is to develop a molecular fingerprint that would help to distinguish HCC patients with a potential for developing metastasis or a recurrence after curative surgery. The high mortality of HCC patients is mainly because of the occurrence of metastasis and recurrence. Distinguishing HCC with metastasis potential may offer an opportunity for patients with a poor prognosis to adjuvant therapy. These studies also have the potential to identify novel diagnostic markers and candidate therapeutic targets for effective therapy. The ultimate goal of these efforts will be to decrease HCC aggressiveness and increase patient survival.