Abstract

Genetic variation, through its effects on gene expression, plays a crucial role in phenotypic variation and disease susceptibility. Recent studies from our group and others have integrated a number of resources and technologies to assess several aspects of genome variation affecting gene expression. Some of these large-scale mapping studies involving expression quantitative traits have recently been reviewed [Gibson, G. and Weir, B. (2005) The quantitative genetics of transcription. Trends Genet., 21, 616–623; de Koning, D.J. and Haley, C.S. (2005) Genetical genomics in humans and model organisms. Trends Genet., 21, 377–381], with particular attention to the statistical issues. In this review, we compare allele-specific expression studies in human samples (primarily lymphoblastoid cell lines from the CEPH HapMap panel), as a prelude to a discussion on study design issues and sources of variation, in order to propose the steps required to build a detailed map of cis-acting regulatory variation in the human genome. Obtaining panels of tissues from large numbers of individuals remains an important limitation. We also conclude that there is insufficient knowledge as to the feasibility of comprehensive studies of trans-acting variation in the human genome.

INTRODUCTION

Since the advent of expression microarrays one decade ago, they have been used in a range of investigations encompassing simple technical feasibility experiments in model organisms to complex study designs involving multiple human tissues. In the past 2 years, the expression profiling literature has been particularly interesting for human geneticists, because of numerous reports characterizing the role of human genome polymorphism in the context of large-scale investigations of gene expression. Sequencing of the human genome (1,2) and characterization of its variation (3) in human populations have provided the necessary resources for large-scale genetic studies of human expression traits. Understanding the genetic control mechanisms of gene expression will provide insight to functional elements of the human genome, and ultimately reveal polymorphisms that modify risk to human diseases.

Integrating genotypic data from gene-mapping studies and expression profiling to identify susceptibility genes for complex traits has recently been achieved in mouse disease models (4,5). Human studies combining gene expression profiles and genome scans in cases and controls and/or families with disease phenotypes have not yet progressed this far, because of technical, analytical and cost issues. Instead, the research community has focused on the expression traits themselves. Studies to dissect cis- or trans-acting variation are often performed separately, using different methodologies, and usually divided in the analysis and interpretation of results. Cis-acting variants affect transcript synthesis or stability in an allele-specific manner and are close to the gene(s) that they regulate, whereas trans-acting variants are not close (usually on different chromosomes) and can affect both alleles of a gene.

Cis-acting variants are commonly thought to involve regulatory elements such as promoters and enhancers, which may lie immediately upstream of the gene, but can also be found hundreds of kilobases away. Until recently, studies of cis-acting variants were restricted to the in vitro comparison of polymorphic promoter constructs transfected in cell lines (6). The more recent in vivo approaches, usually based on mapping expression traits or relative allelic expression (also called allelic imbalance), are evaluated in large panels of cell lines or tissues. Genes are studied in their native sequence environment (and haplotype context), allowing the detection of regulatory factors that are acting over short and long range. Promoter construct studies remain useful to validate cis-acting effects discovered by hypothesis-free methods. Cis-acting variants (or markers in tight linkage disequilibrium with them) are valuable markers for association studies of appropriate human phenotypes. In cases where the regulatory polymorphism is identified, it may provide indirect clues of the transcription factors involved in the regulation of a specific gene and may help identify novel regulatory motifs.

Variation in trans-acting control of gene expression is more difficult to establish. No general methods dedicated to large-scale studies of trans-acting effects exist. Current approaches are based on genome-wide mapping of expression levels (or eQTLs). If a trans-acting effect is mapped to a chromosomal locus, the underlying variant may be a coding variant or regulatory variant in a gene involved in the transcriptional control of the gene(s) that is (are) affected. Validation of a trans-acting variant requires relatively complex functional studies. To date, none of the suggested trans-acting variants underlying human expression traits have been conclusively validated. An exciting outcome of proving a trans-acting variant is the possibility for discovery of gene-regulatory networks, i.e. single variants could influence the activities of a whole pathway and reveal previously unknown gene expression networks. Genome-wide mapping studies in yeast and mice have demonstrated ‘linkage hotspots’, suggesting the existence of ‘master regulators’ governing gene expression of multiple unlinked loci (7,8). Intuitively, it is possible that a single regulatory variant with subtle trans-acting influences on expression of a group of genes could lead to more pronounced effects on cellular function than a single cis-acting variant in a downstream effector gene.

Source tissues (or the lack-there-of)

Studying the heritability of human expression traits would optimally include a relatively large number of samples (i.e. several hundreds), similar to many other complex trait studies. Standardized tissue banks of related or unrelated individuals do not yet exist. As a consequence of this, most human eQTL studies have been carried out in transformed cell lines, principally Ebstein–Barr virus (EBV) immortalized lymphocytes or lymphoblasts (LCLs). The LCLs are often obtained from publicly available repositories; there is no control over the number of passages or exact immortalization method each line has been subjected to. Known problems related to prolonged LCL culture are reduction of cellular mosaicism (clonality) (9,10) and induction of aberrant DNA methylation (11). Two recent papers have used the allele-specific transcripts found in dbEST to infer allelic expression from EST data by comparing representation of alleles within a heterozygous EST library (12) or by comparing allele frequencies in multiple EST libraries to known population allele frequencies (13). The genes shown to have significant deviations in allelic EST representation in these two studies may harbor common cis-acting polymorphisms and as EST libraries are derived from various tissues, these data provide evidence for cis-acting effects in multiple human tissues as opposed to LCLs.

Genome-wide linkage or association for total expression trait mapping

Five recent papers investigated tens to thousands of expression phenotypes in LCLs. Two notable studies (14,15) applied expression profiling for assessment of thousands of gene expression traits on DNA microarrays or GeneChips. The studies utilized a similar design, in which the identification of variable expression traits was followed by whole genome linkage analyses. The samples were from the CEPH family collection (16). Both studies applied only partially overlapping sample sets and different marker panels as well as analytical approaches; it is therefore not unexpected that the results appear different. Nevertheless, the degree of disparity between the studies has raised discussion about the validity of the data (17,18). Below, we discuss the issues we believe important to understanding the discordances observed among these studies. A separate study applied the same concept for a smaller set of genes (n=41) using real-time polymerase chain reaction (PCR) for the determination of expression traits (19). The expression traits with cis-acting linkages (also the strongest linkages) of Morley et al. were further studied by whole genome association (WGA) approach (20) using 60 unrelated CEPH samples (HapMap-CEU) that were included in building the human haplotype map (3). These same samples were also used by a study measuring over 600 expression traits using bead arrays on fiber-optic bundles (21).

The detailed analysis of the results from these studies has been the subject of earlier reviews and we will only focus on common themes emerging from the data; some details are tabulated in Table 1. Monks et al. identified hundreds of genes showing significant heritability of their expression levels, but only a small fraction of these showed highly significant linkage (0.1% of all studied genes). A much greater proportion of investigated genes were found to have linkage in the study by Morley et al. (close to 2% of all genes). A common feature to both studies was the enrichment of cis-linkages in the top hits, at times explaining over 50% of trait variance. Overall, cis-linkages accounted for a third of all hits. Interestingly, the WGA-based study (21) found only cis-acting signals (0.5–1.5% of all genes). One feature that probably reduced the hit rate (after correction for multiple testing) in the latter study was the unrestricted selection of target genes: the panel of genes was selected from ENCODE (19) regions as well as chromosome 21, thus LCL expressed genes were not enriched in any way. The inclusion of 60% of all genes likely incorporated many ‘noisy’ genes, as in our experience RT–PCR routinely detects 50–65% of unselected transcripts in LCLs. Finally, in the small linkage study by Deutsch et al., 10% of studied genes reached significance (2.5% cis-linkages), but the authors noted that many nominally significant trans-acting signals failed the permutation tests.

Risch and Merikangas (22) suggested that common QTLs with modest contribution to trait variance are best detected by association methods. Although the deficiencies of the eQTL mapping studies based on genome-wide linkage (lack of replication and small sample size) may not do justice for linkage methodologies, it does seem likely that the now technically (and analytically) feasible WGA approach (23) will attract greater attention in mapping common eQTLs. In addition to the theoretical advantages outlined by Risch and Merikangas, the attractiveness of WGA lies in the easier sample collection as well as more straightforward interpretation and validation of detected signals, which are already ‘fine-mapped’. The performance of the whole genome linkage approach for discovery of cis-acting effects was studied by association-mapping-based analysis of 384 cis-linkages identified earlier (20). Replication of linkage data was obtained for ∼20% of genes, and in the case of the strongest 27 cis-acting signals detected by linkage, the WGA analysis yielded corroborating evidence for ∼50% of genes. Reasons for non-replication of cis-linkages include allelic heterogeneity of cis-eQTLs and insufficient sample size (affecting the statistical power) in the follow-up association study as well as false-positive linkages. These reasons are similar to the ones explaining the unfortunate lack of replication for many complex trait mapping studies (24).

Discordances between studies measuring total expression

We examined the correlations between expression studies carried out in different laboratories and on different expression platforms. We focused on expression traits (genes) that demonstrated linkage (14) or association (20,21) or showed significant heritability (15). We also included data from our own LCL expression studies on GeneChips (U133, Affymetrix) that overlap with these sample sets. Overall correlations between replicates within the same study, as well as between platforms and on same platform between groups and different probes can be seen on density plots (Fig. 1A). The best correlations (i.e. the curves with increased density towards high r2-values) are observed with RNA replicates on the same platform (a and b, red and orange lines) and worst correlations (i, black line) in random comparisons of data. Surprisingly, correlations of the bead-array data with other platforms (f and h, turquoise and brown lines) appear to be quite poor. The best correlation between groups (independent cell culture) is seen with our LCL data compared with Morley et al. (same platform) data (d, green line). A more detailed look is provided in Figure 1B separating correlations between studies to genes with highly significant cis-acting linkages/associations (14,20) and to genes with highly significant trans-acting linkages. It appears that even strong trans-acting signals may be reproduced poorly, an observation that has been discussed earlier (18) and partly attributed to the poor performance of linkage statistics for correlated expression traits common in human expression profiles (25). Similarly, in a WGA study (21), it was observed that false-positive trans-signals appeared to be due to few outlier data points for some expression traits. One explanation could be an environmental confounder creating a spurious trans-linkage or association, thus only the strongest cis-acting signals may seem replicable. We also compared correlations in expression levels between the four independent LCL data sets in the case of genes with validated versus non-validated cis-linkage by association (20): for the genes with cis-acting linkages that were validated, the average maximum r2-values were higher as compared with non-validated genes (0.56 versus 0.42, P=0.02). This suggests that cell culture is a major source of variation and false-positive signals. A systematic approach for exploring stability of expression traits in LCLs is urgently needed to appropriately evaluate the increasing number of large-scale experiments on LCL panels. Introduction of cell culture replicates into study designs will likely uncover many heritable expression traits buried in the ‘technical variation’ and allow identification of truly heritable expression traits. Finally, the lack of replication between different expression platforms is partly due to probe placement in different parts of the genes. A given expression platform may be robust to small changes (SNPs) under the probe sequence (26,27), but only assaying all exons of a gene (28) would ensure picking up variable isoform expression, which can be a source of apparent lack of replication of true cis-acting variation (29). In fact, analysis of patterns of expression level correlation between studies for the 15 cis-acting eQTLs replicated by association (20) reveals multiple examples of genes with probe comparisons showing both high and low (or no) concordance, suggesting an isoform-specific effect. An example of this phenomenon is given in Figure 2. The RefSeq annotation of phosphorybosylpyrophosphate aminotransferase gene (PPAT) is based on mRNA sequences with a long 3′-UTR, but some full-length mRNA sequences support the existence of a shorter mRNA species. Two probe sets interrogate expression of PPAT gene on Affymetrix U133 GeneChip (used by us in measuring LCL expression), whereas only one of these probe sets is included on the GeneChips used in the published studies (14,20). The probe set (209434_s_at) is specific for U133 GeneChip and for the longer mRNA, whereas the other (209433_s_at) is common to long and short PPAT gene models and is included on both Affymetrix GeneChips. Comparing expression data from same LCLs in independent groups (Morley et al. data versus data produced in our own group) and within our samples reveals that the correlation data support a heritable effect for expression of the PPAT short isoform only.

Screening for cis-acting effects by total versus allelic expression approach

The total expression studies discussed above involve massive parallel expression assays on DNA chips (14,20), microarrays (15) or bead arrays on fiber-optic bundles (21). The main advantage of these platforms is the relative maturation of the technologies: the complete (known) transcriptome can be assayed at once, common sources of error can be avoided by carefully following quality control procedures and by replication of the hybridizations. Non-parallel methods for expression analyses have been applied for smaller sets of genes (30), but are not amenable for genome-wide studies. If markers close to the gene of interest are correlated with its expression level, a putative cis-acting effect is inferred.

Allelic expression assays (31,32) are optimal for detecting cis-acting differences, as each allele serves as an internal control for the other, and trans-acting effects or environmental conditions that differentially influence gene expression among samples should not interfere. Only cis-acting changes in the relative expression of alleles yield reproducible differences between allelic abundances of transcripts. By measuring the ratio of alleles, even subtle cis-acting differences can be revealed, even if feedback control of gene expression dampens the effect on total expression levels between genotypic groups (Fig. 3). The drawbacks of the allelic imbalance approach are the lack of validated high-throughput assays for human genes, the limitation to samples that are heterozygous, and the fact of non-heritable factors (such as epigenetic events) may influence allelic representation. The number of informative heterozygotes can be increased by using pre-mRNA (including non-coding introns) (10) or by using the HaploChip method (33). The relative strengths and weaknesses of the total versus allelic expression assays for detection of cis-acting (heritable) effects are summarized in Table 2.

Mapping cis-acting effects detected in allelic expression assays to regulatory haplotypes

We recently showed the relative ease in locating the causal regulatory variant to common human haplotypes (29). The basis of our study was a set of 73 genes, which had earlier evidence of allelic expression (10,13). We were able to validate allelic expression in 88% of genes (n=64) using a sensitive sequence-based method (13) and multiple assay designs per gene. We applied a systematic mapping approach utilizing the whole genome SNP data available for the HapMap (CEU) LCLs along with the necessary phase data afforded by the ‘trio’ design of the sample panel. Our study showed that the patterns of allelic expression seen in the 64 genes was partly or completely explained (based on gene-based permutation tests) by SNPs/haplotypes in the vicinity of gene (±100 kb) for approximately 50% (n=33) of studied genes.

Validation of mapped traits

The current state of studies of human expression traits is reminiscent of early disease association studies (34), which applied small sample sizes and no replication in independent samples. The criteria for accepting an expression trait study will likely evolve rapidly, but as gene expression levels can be viewed as ‘simple’ complex traits (as compared with human disease phenotypes), molecular genetic methods may allow validation without extensive sample sizes and multiple cohorts. Elegant perturbation experiments knocking out or inactivating a gene of interest can be carried out in mouse and other model organisms, which can be held as the ‘gold standard’ for validation of trans-acting effects. Obviously, such experiments are challenging and sometimes unfeasible for validation of trans-acting effects observed for human genes. In contrast, validation of cis-eQTLs is straightforward by allele-specific expression assays, but has only been applied systematically in yeast (35), showing a 52–78% validation rate of self-linkages. Our recent study also investigated cis-acting traits discovered by allele-specific expression assays by using total expression association and yielded a validation rate of ∼50% (29). It is evident by these and other studies that even strong evidence by one approach may lead to wrong conclusions and complementary data from total and allelic expression assays (10,14,20,33) are beneficial for the detection of true heritable cis-acting effects. Identification of a common ‘regulatory haplotype’ is thus straightforward, but such haplotypes may include multiple correlated DNA variants and hinder the identification of the specific regulatory variants that are functional. Cis-acting haplotypes can span tens to hundreds of kilobases including hundreds of SNPs (20,29). Isolated examples of mapped functional variants are published (20,36) but systematic approaches to discover the causative regulatory variant(s) are lacking.

CONCLUSION

A realistic goal in the immediate future will be the creation of validated and comprehensive lists of cis-acting polymorphisms in the human genome. Achieving this will require an extension of current approaches to larger sample sets and tissue panels. The latter is challenging as panels of human tissues for such studies are scarce for most tissues, apart from cells obtained from peripheral blood samples. A concerted plan to develop this resource with the assistance of physicians, surgeons, pathologists, tissue banks and ethicists is required to overcome the lack of human tissues for expression trait studies.

The simple correlations we present in this review underscore the importance of replication at all stages of the experimental design. Most expression studies in LCLs have applied replicate RNA samples, but no study applied independent cultures in screening for heritable expression traits and only one study used separate cultures for the validation of cis-acting effects (29). Although we note that this recommendation differs from a reported comment that ‘inter- and intra-cell line variation was not significantly different’ in an experiment involving transformation of the same lymphocytes at different times and repeated expression assays in the independent cultures (30), this statement was supported by studies limited to one individual only.

We propose that scaling-up studies of cis-acting regulatory variants are timely, but we also recommend that studies of trans-regulation remain exploratory, given the technical, statistical limitations and uncertainties. The focus towards the cis-regulatory variation of the human genome remains an ambitious, but realistic one, because tools are readily available for large-scale validation and mapping, and cis-acting signals have appeared stronger in studies to date, and are thus amenable to detection in a moderate sample size.

This catalogue will complement the intensive studies of non-coding functional elements (19,37) and increase our understanding of the multiple mechanisms controlling gene expression. A comprehensive study will undoubtedly produce collections of common functional polymorphisms across the genome. The influence of regulatory variation in the human genome is still unknown in regards to the basic mechanisms of action and the number of cis-acting variants. The spectrum of mechanisms by which cis-acting polymorphisms influence gene expression has yet to be fully understood: transcriptional control, message stability, relative isoform expression, etc. Whole genome linkage/association-based studies of total expression have suggested that 1–3% of genes (20,21) harbor common cis-acting variants; allelic imbalance studies show a higher proportion of genes (up to 5%) providing evidence for common heritable effects (29). The true number of variants is likely much higher because all studies to date have applied small sample size and a single tissue or cell type, and there has been no attempt to cover all coding exons containing SNPs. Thus, even if there are only a few validated regulatory haplotypes/SNPs derived from studies to date, it is likely that they will be more common than structural polymorphisms (38) and may approach the number of common amino-acid changing SNPs (39), the functionality of which is more challenging to establish in large scale (40).

ACKNOWLEDGEMENTS

T.J.H. is the recipient of a Clinician-Scientist Award in Translational Research by the Burroughs Wellcome Fund and an Investigator Award from CIHR.

Conflict of Interest statement. The authors declare there is no conflict of interest.

Figure 1. (A) Correlation of expression levels of heritable expression traits between studies. A density plot depicting correlations between various expression data sets using LCL panels, for genes with heritable or with demonstrated linkage/association in at least one of the listed studies. The graph shows r2 density plots between each correlation. a: RNA replicates on Affymetrix GeneChips (14,20); b: RNA replicates on Illumina BeadArrays (21); c: same RNA hybridized to different Affymetrix probe sets in the same gene; d: same samples hybridized to same Affymetrix probe sets in different laboratories (Morley et al. [14] data versus our own LCL GeneChip data); e: same samples hybridized to Agilent arrays and Affymetrix GeneChips (Monks et al. [15] versus Morley et al. [14] data); f: same samples hybridized to Illumina BeadArrays and Affymetrix GeneChips (Stranger et al. [21] versus Cheung et al. [20] data); g: same samples hybridized to Agilent arrays and Affymetrix GeneChips (Monks et al. [15] versus our own LCL GeneChip data); h: same samples hybridized to Illumina BeadArrays and Agilent microarrays (Stranger et al. [21] versus Monks et al. [15] data); i: random correlations between data. (B) Comparison of correlations between cis- and trans-linkages or associations. The red line shows average expression level correlations between all overlapping LCL expression studies for genes included in top cis-acting linkages (LOD 5.3) in the data set by Morley et al. The black line shows average expression level correlations between studies for genes included in top cis-acting associations in the data set by Stranger et al. The blue line shows average expression level correlations between studies for genes included in top trans-acting linkages (LOD 5.3) in the data set by Morley et al. The density plot suggests that the genes with trans-linkages are less likely to show reproducible expression (r2-values are lower) across studies compared with the cis-hits identified by linkage. Furthermore, as noted in correlations depicted in (A), the Stranger et al. data appear to reproduce poorly with other expression platforms/studies.

Figure 1. (A) Correlation of expression levels of heritable expression traits between studies. A density plot depicting correlations between various expression data sets using LCL panels, for genes with heritable or with demonstrated linkage/association in at least one of the listed studies. The graph shows r2 density plots between each correlation. a: RNA replicates on Affymetrix GeneChips (14,20); b: RNA replicates on Illumina BeadArrays (21); c: same RNA hybridized to different Affymetrix probe sets in the same gene; d: same samples hybridized to same Affymetrix probe sets in different laboratories (Morley et al. [14] data versus our own LCL GeneChip data); e: same samples hybridized to Agilent arrays and Affymetrix GeneChips (Monks et al. [15] versus Morley et al. [14] data); f: same samples hybridized to Illumina BeadArrays and Affymetrix GeneChips (Stranger et al. [21] versus Cheung et al. [20] data); g: same samples hybridized to Agilent arrays and Affymetrix GeneChips (Monks et al. [15] versus our own LCL GeneChip data); h: same samples hybridized to Illumina BeadArrays and Agilent microarrays (Stranger et al. [21] versus Monks et al. [15] data); i: random correlations between data. (B) Comparison of correlations between cis- and trans-linkages or associations. The red line shows average expression level correlations between all overlapping LCL expression studies for genes included in top cis-acting linkages (LOD 5.3) in the data set by Morley et al. The black line shows average expression level correlations between studies for genes included in top cis-acting associations in the data set by Stranger et al. The blue line shows average expression level correlations between studies for genes included in top trans-acting linkages (LOD 5.3) in the data set by Morley et al. The density plot suggests that the genes with trans-linkages are less likely to show reproducible expression (r2-values are lower) across studies compared with the cis-hits identified by linkage. Furthermore, as noted in correlations depicted in (A), the Stranger et al. data appear to reproduce poorly with other expression platforms/studies.

Figure 2. An example of a probe location effect explaining discordant data. The PPAT gene (validated in the Cheung data set) may either appear consistent (bottom left graph) or inconsistent (bottom middle graph) between data sets (Morley et al. and an in-house data set) as a result of using different probe sets that tag different portions of the PPAT message. The probe sets (bottom right graph) do not yield consistent data even within the same data set, suggesting that only the short isoform of PPAT is expressed in LCLs.

Figure 2. An example of a probe location effect explaining discordant data. The PPAT gene (validated in the Cheung data set) may either appear consistent (bottom left graph) or inconsistent (bottom middle graph) between data sets (Morley et al. and an in-house data set) as a result of using different probe sets that tag different portions of the PPAT message. The probe sets (bottom right graph) do not yield consistent data even within the same data set, suggesting that only the short isoform of PPAT is expressed in LCLs.

Figure 3. An example of an autosomal locus harboring a cis-acting polymorphism causing relative overexpression of ‘A-allele’ in a gene with feedback control of expression. The levels of total RNA are identical between genotypic groups due to the strict feedback control of RNA output that does not interact with the cis-acting polymorphism (i.e. affects both alleles equally). The cis-acting polymorphism can be discovered by allelic expression measurements in the heterozygous (middle) AB-sample regardless of trans-acting feedback effects. In a different physiological state (altering the feedback mechanism), the genotypic groups may also express the alleles at different levels leading to phenotypic consequences.

Figure 3. An example of an autosomal locus harboring a cis-acting polymorphism causing relative overexpression of ‘A-allele’ in a gene with feedback control of expression. The levels of total RNA are identical between genotypic groups due to the strict feedback control of RNA output that does not interact with the cis-acting polymorphism (i.e. affects both alleles equally). The cis-acting polymorphism can be discovered by allelic expression measurements in the heterozygous (middle) AB-sample regardless of trans-acting feedback effects. In a different physiological state (altering the feedback mechanism), the genotypic groups may also express the alleles at different levels leading to phenotypic consequences.

Table 1.

Summary of studies applying genome-wide linkage or association mapping for detection of eQTLs

 Morley et al. (14Monks et al. (15Cheung et al. (20Stranger et al. (21Deutsch et al. (30
Samples 195 individuals (14 CEPH families)     
 95 unrelated individuals (CEPH grandparents) for variance estimation 167 individuals (15 CEPH families) 57 individuals (unrelated CEPH parents, HapMap CEU sample set) 60 individuals (unrelated CEPH parents, HapMap CEU sample set) 132 individuals (10 CEPH families)+HapMap CEU sample set in validation 
Expression assay GeneChip (Affymetrix) ∼8500 genes Microarray (Agilent) ∼24 000 genes GeneChip (Affymetrix) ∼8500 genes BeadArrays (Illumina) custom expression panel ∼630 genes TaqMan (Applied Biosystems) real-time RT–PCR 41 genes (Chr 21) 
Trait selection approach and method for mapping Genes with variance ratio >1 selected and genome-wide linkage carried out Variable genes identified and genome-wide linkage+tests for heritability carried out Genes showing cis-linkage studied by refined SNP mapping or WGA Most variable and expressed genes selected for WGA analysis Variable genes were assessed for heritability and genome-wide linkage carried out 
Number of genes subjected to mapping 3554 variable expression traits 2430 variable expression traits 374 genes subjected to refined mapping, 27 strongest cis-linkages to WGA 374 genes, for which WGA or ‘cis-analysis’ was carried out 19 genes with heritable expression studied 
Number of eQTLs reported (cis/trans142 at LOD 5.3 (27/110), 984 at LOD 3.4 20 genes at LOD 6.3 (8/12), 132 at P<0.0005 (25/107) 65 with cis-association at P<0.001, in WGA analysis 15/27 strong cis-linkages supported by association data Six (3/6) significant in WGA after Bonferroni and three (3/0) by applying permutation; cis-analysis: 10 genes significant; ‘gene-based’ cis-significance reached for 63 (19 by chance) Four (1/3) reaching genome-wide P<0.05 
Method for validation/replication Strongest cis-linkages tested for association by typing additional SNPs+allelic expression tests No follow-up Transient transfection and HaploChIP assay No follow-up Independent sample set (HapMap CEU) tested for cis-association 
Number of validated or replicated eQTLs (% of total) 14/17 cis-linkages showing association (10% of strong or 1.5% of all linkages); two genes tested for allelic expression NA 1/15 strong cis-associations validated (7% of strong or <2% of all reported associations) NA 1/1 cis-linkage validated by association (25% of all reported linkages) 
 Morley et al. (14Monks et al. (15Cheung et al. (20Stranger et al. (21Deutsch et al. (30
Samples 195 individuals (14 CEPH families)     
 95 unrelated individuals (CEPH grandparents) for variance estimation 167 individuals (15 CEPH families) 57 individuals (unrelated CEPH parents, HapMap CEU sample set) 60 individuals (unrelated CEPH parents, HapMap CEU sample set) 132 individuals (10 CEPH families)+HapMap CEU sample set in validation 
Expression assay GeneChip (Affymetrix) ∼8500 genes Microarray (Agilent) ∼24 000 genes GeneChip (Affymetrix) ∼8500 genes BeadArrays (Illumina) custom expression panel ∼630 genes TaqMan (Applied Biosystems) real-time RT–PCR 41 genes (Chr 21) 
Trait selection approach and method for mapping Genes with variance ratio >1 selected and genome-wide linkage carried out Variable genes identified and genome-wide linkage+tests for heritability carried out Genes showing cis-linkage studied by refined SNP mapping or WGA Most variable and expressed genes selected for WGA analysis Variable genes were assessed for heritability and genome-wide linkage carried out 
Number of genes subjected to mapping 3554 variable expression traits 2430 variable expression traits 374 genes subjected to refined mapping, 27 strongest cis-linkages to WGA 374 genes, for which WGA or ‘cis-analysis’ was carried out 19 genes with heritable expression studied 
Number of eQTLs reported (cis/trans142 at LOD 5.3 (27/110), 984 at LOD 3.4 20 genes at LOD 6.3 (8/12), 132 at P<0.0005 (25/107) 65 with cis-association at P<0.001, in WGA analysis 15/27 strong cis-linkages supported by association data Six (3/6) significant in WGA after Bonferroni and three (3/0) by applying permutation; cis-analysis: 10 genes significant; ‘gene-based’ cis-significance reached for 63 (19 by chance) Four (1/3) reaching genome-wide P<0.05 
Method for validation/replication Strongest cis-linkages tested for association by typing additional SNPs+allelic expression tests No follow-up Transient transfection and HaploChIP assay No follow-up Independent sample set (HapMap CEU) tested for cis-association 
Number of validated or replicated eQTLs (% of total) 14/17 cis-linkages showing association (10% of strong or 1.5% of all linkages); two genes tested for allelic expression NA 1/15 strong cis-associations validated (7% of strong or <2% of all reported associations) NA 1/1 cis-linkage validated by association (25% of all reported linkages) 
Table 2.

Characteristics of total expression versus allelic expression assays in detection of heritable cis-acting effects

Total expression Allelic expression 
+ Commercially available platforms for genome-wide studies − Validated high-throughput methods not available 
+ All samples informative − Only heterozygotes for a marker within the transcript (or in the vicinity of the gene when using HaploChIP) are informative 
− Comparison of genotypic groups across samples (environmental noise) + Comparison of alleles within a sample (internally controlled method) 
+ Epigenetic influences (such as imprinting) does not create false-positives − Other cis-acting mechanisms (such as imprinting) can also leading unequal representation of alleles 
− Feedback control may mask effects + Even subtle effects can be detected regardless of feedback (unless there is an interaction between a cis-acting variant and a trans-acting control mechanism) 
trans-acting loci in linkage disequilibrium with the gene interest may create false-positive signals + Only cis-acting effects can be detected 
+ Population-based samples can be used for fine mapping of detected signals − Phased haplotypes required for localization of cis-acting signal 
Total expression Allelic expression 
+ Commercially available platforms for genome-wide studies − Validated high-throughput methods not available 
+ All samples informative − Only heterozygotes for a marker within the transcript (or in the vicinity of the gene when using HaploChIP) are informative 
− Comparison of genotypic groups across samples (environmental noise) + Comparison of alleles within a sample (internally controlled method) 
+ Epigenetic influences (such as imprinting) does not create false-positives − Other cis-acting mechanisms (such as imprinting) can also leading unequal representation of alleles 
− Feedback control may mask effects + Even subtle effects can be detected regardless of feedback (unless there is an interaction between a cis-acting variant and a trans-acting control mechanism) 
trans-acting loci in linkage disequilibrium with the gene interest may create false-positive signals + Only cis-acting effects can be detected 
+ Population-based samples can be used for fine mapping of detected signals − Phased haplotypes required for localization of cis-acting signal 

References

1
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W. et al. (
2001
) Initial sequencing and analysis of the human genome.
Nature
 ,
409
,
860
–921.
2
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (
2001
)
Science
 ,
291
,
1304
–1351.
3
Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., Donnelly, P.; International HapMap Consortium (
2005
) A haplotype map of the human genome.
Nature
 ,
437
,
1299
–1320.
4
Mehrabian, M., Allayee, H., Stockton, J., Lum, P.Y., Drake, T.A., Castellani, L.W., Suh, M., Armour, C., Edwards, S., Lamb, J. et al. (
2005
) Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits.
Nat. Genet.
 ,
37
,
1224
–1233.
5
Sladek, R. and Hudson, T.J. (
2006
) Elucidating cis- and trans-regulatory variation using genetical genomics.
TIGS
 ,
22
, in press.
6
Rockman, M.V. and Wray, G.A. (
2002
) Abundant raw material for cis-regulatory evolution in humans.
Mol. Biol. Evol.
 ,
19
,
1991
–2004.
7
Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G. et al. (
2003
) Genetics of gene expression surveyed in maize, mouse and man.
Nature
 ,
422
,
297
–302.
8
Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., Mackelprang, R. and Kruglyak, L. (
2003
) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors.
Nat. Genet.
 ,
35
,
57
–64.
9
Migeon, B.R., Axelman, J. and Stetten, G. (
1988
) Clonal evolution in human lymphoblast cultures.
Am. J. Hum. Genet.
 ,
42
,
742
–747.
10
Pastinen, T., Sladek, R., Gurd, S., Sammak, A., Ge, B., Lepage, P., Lavergne, K., Villeneuve, A., Gaudin, T., Brandstrom, H. et al. (
2004
) A survey of genetic and epigenetic variation affecting human gene expression.
Physiol. Genom.
 ,
16
,
184
–193.
11
Hannula, K., Lipsanen-Nyman, M., Scherer, S.W., Holmberg, C., Hoglund, P. and Kere, J. (
2001
) Maternal and paternal chromosomes 7 show differential methylation of many genes in lymphoblast DNA.
Genomics
 ,
73
,
1
–9.
12
Lin, W., Yang, H.H. and Lee, M.P. (
2005
) Allelic variation in gene expression identified through computational analysis of the dbEST database.
Genomics
 ,
86
,
518
–527.
13
Ge, B., Gurd, S., Gaudin, T., Dore, C., Lepage, P., Harmsen, E., Hudson, T.J. and Pastinen, T. (
2005
) Survey of allelic expression using EST mining.
Genome Res.
 ,
15
,
1584
–1591.
14
Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S. and Cheung, V.G. (
2004
) Genetic analysis of genome-wide variation in human gene expression.
Nature
 ,
430
,
743
–747.
15
Monks, S.A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J.W., Sachs, A. and Schadt, E.E. (
2004
) Genetic inheritance of gene expression in human cell lines.
Am. J. Hum. Genet.
 ,
75
,
1094
–1105.
16
Dausset, J., Cann, H., Cohen, D., Lathrop, M., Lalouel, J.M. and White, R. (
1990
) Centre d'etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome.
Genomics
 ,
6
,
575
–577.
17
Gibson, G. and Weir, B. (
2005
) The quantitative genetics of transcription.
Trends Genet.
 ,
21
,
616
–623.
18
de Koning, D.J. and Haley, C.S. (
2005
) Genetical genomics in humans and model organisms.
Trends Genet.
 ,
21
,
377
–381.
19
ENCODE Project Consortium. (
2004
) The ENCODE (ENCyclopedia Of DNA Elements) Project.
Science
 ,
306
,
636
–640.
20
Cheung, V.G., Spielman, R.S., Ewens, K.G., Weber, T.M., Morley, M. and Burdick, J.T. (
2005
) Mapping determinants of human gene expression by regional and genome-wide association.
Nature
 ,
437
,
1365
–1369.
21
Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., Lyle, R., Hunt, S., Kahl, B., Antonarakis, S.E., Tavare, S. et al. (
2005
) Genome-wide associations of gene expression variation in humans.
PLoS Genet.
 ,
16
,
e78
.
22
Risch, N. and Merikangas, K. (
1996
) The future of genetic studies of complex human diseases.
Science
 ,
273
,
1516
–1517.
23
Lawrence, R.W., Evans, D.M. and Cardon, L.R. (
2005
) Prospects and pitfalls in whole genome association studies.
Philos. Trans. R. Soc. Lond. B Biol. Sci.
 ,
360
,
1589
–1595.
24
Terwilliger, J.D., Haghighi, F., Hiekkalinna, T.S. and Goring, H.H. (
2002
) A biased assessment of the use of SNPs in human complex traits.
Curr. Opin. Genet. Dev.
 ,
12
,
726
–734.
25
Perez-Enciso, M. (
2004
) In silico study of transcriptome genetic variation in outbred populations.
Genetics
 ,
166
,
547
–554.
26
Hubner, N., Wallace, C.A., Zimdahl, H., Petretto, E., Schulz, H., Maciver, F., Mueller, M., Hummel, O., Monti, J., Zidek, V. et al. (
2005
) Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease.
Nat. Genet.
 ,
37
,
243
–253.
27
Hughes, T.R., Mao, M., Jones, A.R., Burchard, J., Marton, M.J., Shannon, K.W., Lefkowitz, S.M., Ziman, M., Schelter, J.M., Meyer, M.R. et al. (
2001
) Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer.
Nat. Biotechnol.
 ,
19
,
342
–347.
28
Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R. and Shoemaker, D.D. (
2003
) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays.
Science
 ,
302
,
2141
–2144.
29
Pastinen, T., Ge, B., Gurd, S., Gaudin, T., Dore, C., Lemire, M., Lepage, P., Harmsen, E. and Hudson, T.J. (
2005
) Mapping common regulatory variants to human haplotypes.
Hum. Mol. Genet.
 ,
14
,
3963
–3971.
30
Deutsch, S., Lyle, R., Dermitzakis, E.T., Attar, H., Subrahmanyan, L., Gehrig, C., Parand, L., Gagnebin, M., Rougemont, J., Jongeneel, C.V. and Antonarakis, S.E. (
2005
) Gene expression variation and expression quantitative trait mapping of human chromosome 21 genes.
Hum. Mol. Genet.
 ,
14
,
3741
–3749.
31
Yan, H., Yuan, W., Velculescu, V.E., Vogelstein, B. and Kinzler, K.W. (
2002
) Allelic variation in human gene expression.
Science
 ,
297
,
1143
.
32
Pastinen, T. and Hudson, T.J. (
2004
) Cis-acting regulatory variation in the human genome.
Science
 ,
306
,
647
–650.
33
Knight, J.C., Keating, B.J., Rockett, K.A. and Kwiatkowski, D.P. (
2003
) In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading.
Nat. Genet.
 ,
33
,
469
–475.
34
Lohmueller, K.E., Pearce, C.L., Pike, M., Lander, E.S. and Hirschhorn, J.N. (
2003
) Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease.
Nat. Genet.
 ,
33
,
177
–182.
35
Ronald, J., Brem, R.B., Whittle, J. and Kruglyak, L. (
2005
) Local regulatory variation in Saccharomyces cerevisiae.
PLoS Genet.
 ,
19
,
e25
.
36
Knight, J.C., Keating, B.J. and Kwiatkowski, D.P. (
2004
) Allele-specific repression of lymphotoxin-alpha by activated B cell factor-1.
Nat. Genet.
 ,
36
,
394
–399.
37
Cooper, S.J., Trinklein, N.D., Anton, E.D., Nguyen, L. and Myers, R.M. (
2006
) Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome.
Genome Res.
 ,
16
,
1
–10.
38
Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D. et al. (
2005
) Fine-scale structural variation of the human genome.
Nat. Genet.
 ,
37
,
727
–732.
39
Drake, J.A., Bird, C., Nemesh, J., Thomas, D.J., Newton-Cheh, C., Reymond, A., Excoffier, L., Attar, H., Antonarakis, S.E., Dermitzakis, E.T. and Hirschhorn, J.N. (
2006
) Conserved noncoding sequences are selectively constrained and not mutation cold spots.
Nat. Genet.
 ,
38
,
223
–227.
40
Urban, T.J., Sebro, R., Hurowitz, E.H., Leabman, M.K., Badagnani, I., Lagpacan, L.L., Risch, N. and Giacomini, K.M. (
2006
) Functional genomics of membrane transporters in human populations.
Genome Res.
 ,
16
,
223
–230.