Abstract

Recent studies with tiling arrays have revealed more genomic transcription than previously anticipated. Whole new groups of non-coding transcripts (NCTs) have been detected. Some of these NCTs, including miRNAs, can regulate gene expression. To date, most known NCTs studied have been relatively short, but several important regulatory NCTs, including XIST, MALAT-1, BC1 and BC200, are considerably larger in length and represent a novel class of long, non-coding RNA species. Whole-genome tiling arrays were utilized to identify novel long NCTs across the entire human genome. Our results have identified a new group of long (>400 nt), abundantly expressed NCTs and have found that a subset of these are also highly evolutionarily conserved. In this report, we have begun to characterize 15 long, conserved NCTs. Quantitative real-time RT–PCR was used to analyze their expression in different normal human tissue and also in breast and ovarian cancers. We found altered expression of many of these NCTs in both cancer types. In addition, several of these NCTs have consistent mutations when sequences of normal samples were compared with a panel of cancer-derived cell lines. One NCT was found to be consistently mutated in a panel of endometrial cancers compared with matched normal blood. These NCTs were among the most abundantly expressed transcripts detected. There are probably many long, conserved NCTs, albeit with lower levels of expression. Although the function of these NCTs is currently unknown, our study indicates that they may play an important function in both normal cells and in cancer development.

INTRODUCTION

Results of several recent large-scale tiling experiments suggest a significantly higher proportion of transcribed human genomic sequence than could be accounted for by existing exon annotation, with much of the excess not encoding protein (1–6). Genetic researches have focused on gene expression with the coding portion of genes accounting for, at most, 2% of the entire human genome. Researchers have now begun to explore novel, non-coding RNA (ncRNA) species to characterize their potential role in regulatory processes and disease development. The human genome includes a diverse collection of ncRNAs, such as miRNAs and piRNAs (18–30 nt), short translational–regulatory RNAs (100–200 nt), and much longer RNAs (up to 10 000 nt) involved in gene silencing (7,8). Others reported ncRNA characterized by their size (longer and shorter than 200 nt), cellular location (nuclear or cytosolic) and location in relation to gene boundaries (9). Such findings are indicative of various undiscovered classes of RNA species throughout the non-coding regions of the genome possibly associated with cellular functions and potential targets of alteration in a variety of different diseases. Owing to the relatively recent knowledge of such species in the 50–55% of the non-repetitive portion of the human genome corresponding to either intergenic or intronic sequences, there are relatively few publications on ncRNAs compared with publications on the 1–2% of the genome containing protein-coding sequences (10–12).

Some ncRNAs function to regulate and direct complex pathways (13–17). An assortment of small ncRNA species (e.g. miRNA, piRNA and snoRNA) are involved in critical cellular processes and, in some cases, have been linked to specific disease states including cancer (14–20). MiRNAs have been associated with initiation and progression of cancer by acting as tumor suppressors or oncogenes and, thus, could be cancer therapy targets (14,15,21). Collectively, transcriptome analyses comparing tumor and normal cells suggest that defects in ncRNAs, regardless of their length, do occur in tumors (14,15). Furthermore, the deregulation of several ncRNAs (apart from miRNAs) have been observed in prostate, colon, ovary, liver, breast, cervix, esophagus and tongue cancers, as well as leukemias and lymphomas [reviewed in (14)]. Such reports indicate that there may be a new class of longer ncRNA species that are involved in critical regulatory processes and are targets of alteration during the development of cancer.

Currently, there are 940 unique human ncRNAs documented on the RNAdb website (http://www.research.imb.uq.edu.au/rnadb/), excluding antisense, sno, pi and miRNAs. This database and other reports provide evidence of longer ncRNAs (>100 nt) within the genome. Few have been studied in detail (13,14,22,23). A recent mapping project suggested that an appreciable portion of long, unannotated transcripts (potential long ncRNAs) could serve as precursors for smaller RNA species <200 nt in length (9). In spite of this, long ncRNAs such as TSIX and XIST, which are not precursors for smaller RNA species, have been found to be important for the regulation of chromosome X silencing (8,18). It has also been reported that a few long ncRNAs are selectively expressed in tumor cells, but not in corresponding normal cells (13,14,19,20). Specifically, six documented long ncRNA species ranging from 152 to 8000 nt have been shown to have some role in carcinogenesis, and several others are implicated in various neurological diseases [reviewed in (14,24)]. For instance, the highly conserved, long non-coding 8 kb MALAT-1 transcript is specifically involved in non-small cell lung carcinoma (25). In addition, an antisense intronic transcript has been found to correlate with the degree of tumor differentiation in prostate cancer (19,20,24). NcRNAs, BC1 and BC200, are selectively expressed in tumor cells, but not in corresponding normal tissue (19,20). Collectively, these studies demonstrate the importance of long ncRNAs in cancer. Although the functions of ncRNAs are likely to be diverse, both logic and evidence strongly suggest their role is to regulate and direct complex pathways (8,13,26).

In an attempt to study novel, long ncRNA species that may have importance in cellular function and cancer development, a tiling array platform was used to obtain unbiased transcription data across the entire genome in normal human bronchial epithelial (NHBE) cells. We hypothesized that a new group of long ncRNA species are involved in crucial regulatory functions and, when modulated, play roles in cancer. To test this hypothesis, we chose a subset of abundantly expressed non-coding transcripts (NCTs) that are >400 nt long, which displayed a high degree of sequence conservation. DNA sequences that have functional importance are often conserved among species (26,27). The expressions of 15 transcripts that originate from non-coding sequence were then measured in 12 different normal tissues, as well as in a panel of breast and ovarian cancers. From this study, a subset of abundantly expressed, highly conserved NCTs aberrantly expressed in breast and ovarian cancer tissues were discovered. Several of these NCTs seem to be susceptible to mutation in a panel of cancer-derived cell lines, and one NCT was observed to have four different mutations in several endometrial tumors. Collectively, our results demonstrate a novel group of NCTs that may have an important impact on cancer development.

RESULTS

Abundantly expressed long NCTs

The aim of this study was to investigate the possible involvement of long, conserved NCTs in cancer development. To this end, we scanned cDNA from the entire non-redundant portion of the human genome using tiling arrays and focused on the identification of only the most abundantly expressed regions (i.e. consecutive probes that express in the 99.5th signal intensity percentile). At this signal threshold, we found that there were a total of 53 972 transcriptionally active regions (TARs) in NHBE that occur, on average, every 59 kb across the genome. In addition, we observed a substantial increase in the number of TARs as we decreased the threshold to the 99th (∼99 000 TARs) and 98th (∼210 000 TARs) signal intensity percentile. Since there were so many TARs at various lengths and expression levels, we focused our study only on the most abundantly expressed regions (i.e. TARs at the top 0.5% of the tiling array signal), which included highly expressed housekeeping genes such as β-actin. At this threshold level, we identified 578 TARs that were longer than 400 nt in length. Of these, 495 originate from non-protein coding sequence (intergenic or intronic regions) and we bioinformatically examined these regions in order to choose candidate NCTs for subsequent real-time RT–PCR (qPCR), northern blot, and sequencing experiments.

Conservation of long NCTs

As an indication of functional importance, sequence conservation of the highly expressed TARs across 17 diverse species was analyzed using the UCSC Human Genome Browser. Of the 578 long, abundantly expressed TARs examined, 232 displayed a degree of sequence conservation >5%. The majority (163 or 70.2%) of these conserved regions originated from non-coding sequences. When we examine only those TARs that are at least 20% conserved, we found that there are 113 TARs, and more than half (61 TARs) of these conserved regions are originated from non-coding sequences (15 intronic and 46 intergenic). In addition, there were 27 intergenic and 5 intronic TARs that displayed a degree of conservation >50%. Of these, 10 were >90% conserved across 17 species. Of the long, abundantly expressed, conserved, non-coding TARs identified, we chose to examine a distribution of 15 NCTs displaying varying degrees of conservation. We studied four NCTs with moderate levels of conservation (34–69%), five with even greater levels of sequence conservation (70–98%) and six that are ultraconserved (>99%). This assortment of NCTs was selected to conduct experiments that were aimed to examine their expression in 12 normal human tissues, whether they are differentially expressed in breast and ovarian cancer, and to determine whether these regions are targeted for mutation. This list of variably conserved NCTs allowed us to investigate whether there was any correlation between the degree of conservation and altered expression or frequency of mutation in cancers. Each NCT was analyzed comprehensively as described in detail in Materials and Methods section for sequence conservation, and presence of stop codons, repetitive regions, G-C content, etc. to be certain these regions were representative of conserved, non-coding regions. UCSC conservation graphs and corresponding transcriptional signal from tiling arrays of two representative NCTs are displayed in Figure 1A and B. Those NCTs that exhibit >90% sequence conservation across 17 species are denoted with an asterisk.

Figure 1.

Genomic regions that span two representative NCTs, (A) NC4* located on chromosome 12 (62 502 932–62 503 496) and (B) NC26* located on chromosome 4 (136 186 664–136 187 064), are displayed in both the Integrated Genome Browser (IGB) and UCSC Human Genome Browser. In the top IGB panel, the probe signal is represented by green vertical bars. The green horizontal bars below the signal indicate regions of consecutive probes expressing at the top 0.5th percentile of the array data. The UCSC display below corresponds to the exact region represented in IGB display above. As depicted on the UCSC Human Genome Browser, each NCT exhibits large regions of high conservation across multiple species, no alignments with coding regions of predicted or known genes, no repetitive elements, and previously identified TARs (3).

Figure 1.

Genomic regions that span two representative NCTs, (A) NC4* located on chromosome 12 (62 502 932–62 503 496) and (B) NC26* located on chromosome 4 (136 186 664–136 187 064), are displayed in both the Integrated Genome Browser (IGB) and UCSC Human Genome Browser. In the top IGB panel, the probe signal is represented by green vertical bars. The green horizontal bars below the signal indicate regions of consecutive probes expressing at the top 0.5th percentile of the array data. The UCSC display below corresponds to the exact region represented in IGB display above. As depicted on the UCSC Human Genome Browser, each NCT exhibits large regions of high conservation across multiple species, no alignments with coding regions of predicted or known genes, no repetitive elements, and previously identified TARs (3).

Validation of potential NCTs

To validate our tiling array observations, northern blotting experiments were performed to demonstrate the lengths of detected transcripts. Representative blots are displayed in Figure 2. Both lanes on each image represent the same RNA sample run in duplicate and transferred to each membrane; hence, the two bands displayed for many NCTs in Figure 2. As given in Table 1, NCT size was characterized by two ways: regions of consecutive probes displaying signal in the 99.5th percentile were documented as the initial indication of transcript size. This is most likely an underestimation of the actual transcribed region. Therefore, we also documented the area surrounding this region that is also transcriptionally active, but at a level lower than the 99.5th percentile. Our northern results show that the resolved NCT bands tend to correspond in size to the entire transcribed area rather than to the size of the fragment derived from only the chain of consecutive probes expressing above the 99.5th signal intensity percentile threshold. In addition to northern blots, qPCR was used to precisely quantitate the expression of each NCT in different human tissues and in a panel of breast and ovarian cancers.

Figure 2.

Northern blots displaying the expression of representative NCTs in NHBE. For most NCTs, duplicate samples were procured to confirm reproducibility. Arrows on the right of each image correspond to RNA marker bands to estimate the NCT size. When comparing bands size to signal from the tiling array, the results show that the resolved NCT bands tend to correspond in size to the entire transcribed area than to the size of the transcript derived from only the chain of consecutive probe expressing above the 99.5th signal intensity percentile threshold. NCTs denoted with an asterisk (*) are >90% conserved.

Figure 2.

Northern blots displaying the expression of representative NCTs in NHBE. For most NCTs, duplicate samples were procured to confirm reproducibility. Arrows on the right of each image correspond to RNA marker bands to estimate the NCT size. When comparing bands size to signal from the tiling array, the results show that the resolved NCT bands tend to correspond in size to the entire transcribed area than to the size of the transcript derived from only the chain of consecutive probe expressing above the 99.5th signal intensity percentile threshold. NCTs denoted with an asterisk (*) are >90% conserved.

Table 1.

NCT genomic position, size (according to the 99.5th signal percentile and total area of transcription) and orgin in the human genome

NCT Cytoband Coordinates (start–stop) Size (nt) P99.5 Approx. length of transcription (kb) Nearest gene or exon and distance (kb) Origin of transcript 
      IGB RefSeq UCSC predictions Ensembl 
NC04* 12q14.2 62 502 932–62 503 496 565 2.2–2.4 TMEM5 (13.8) NC NC NC NC 
NC05* 12p12.3 17 035 548–17 036 128 581 0.9–1.1 LMO3 (384) NC NC NC NC 
NC06 9p24.1 7 467 123–7 467 558 436 1.5–1.7 JMJD2C (302) NC NC NC NC 
NC21* 12q14.2 61 435 558–61 435 946 409 1.0–1.1 FLJ25590 (153) NC 
NC22* 3q24 149 366 637–149 367 000 404 0.4–1.6 AGTR1 (531) NC NC NC NC 
NC25 6q13 72 351 607–72 352 009 403 1.3–1.5 C6orf155 (163) NC NC NC NC 
NC26* 4q28.3 136 186 664–136 187 064 401 1.6–1.7 PCDH10 (1840) NC NC NC NC 
NC28* 8p12 34 851 236–34 851 648 413 0.7–1.6 UNC5D (665) NC NC NC NC 
NC29 15q21.3 54 152 112–54 152 543 432 2.4–2.7 RFXDC2 (17.6) NC NC NC NC 
NC30* 8q21.13 81 633 845–81 634 254 410 1.3–1.8 ZBTB10 (38.8) NC NC NC 
NC31 12q23.3 103 183 218–103 183 704 487 0.6–0.7 TXNRD1 (21.3) NC 
NC33 5q21.1 97 703 482–97 704 002 521 1.7–1.9 RGMB (424) NC NC NC NC 
NC35 11p15.4 10 486 015–10 488 239 2225 2.4–3.3 AMPD3 (1.4) NC NC E(60%)/NC NC/E(10%) 
NC39 3q11.2 97 818 707–97 820 019 1313 1.3–1.4 ARL6 (1145) NC NC NC NC/E(5%) 
NC40 3q27.2 186 618 028–186 619 252 1225 1.3–1.4 MAP3K13 (10.2) 
NCT Cytoband Coordinates (start–stop) Size (nt) P99.5 Approx. length of transcription (kb) Nearest gene or exon and distance (kb) Origin of transcript 
      IGB RefSeq UCSC predictions Ensembl 
NC04* 12q14.2 62 502 932–62 503 496 565 2.2–2.4 TMEM5 (13.8) NC NC NC NC 
NC05* 12p12.3 17 035 548–17 036 128 581 0.9–1.1 LMO3 (384) NC NC NC NC 
NC06 9p24.1 7 467 123–7 467 558 436 1.5–1.7 JMJD2C (302) NC NC NC NC 
NC21* 12q14.2 61 435 558–61 435 946 409 1.0–1.1 FLJ25590 (153) NC 
NC22* 3q24 149 366 637–149 367 000 404 0.4–1.6 AGTR1 (531) NC NC NC NC 
NC25 6q13 72 351 607–72 352 009 403 1.3–1.5 C6orf155 (163) NC NC NC NC 
NC26* 4q28.3 136 186 664–136 187 064 401 1.6–1.7 PCDH10 (1840) NC NC NC NC 
NC28* 8p12 34 851 236–34 851 648 413 0.7–1.6 UNC5D (665) NC NC NC NC 
NC29 15q21.3 54 152 112–54 152 543 432 2.4–2.7 RFXDC2 (17.6) NC NC NC NC 
NC30* 8q21.13 81 633 845–81 634 254 410 1.3–1.8 ZBTB10 (38.8) NC NC NC 
NC31 12q23.3 103 183 218–103 183 704 487 0.6–0.7 TXNRD1 (21.3) NC 
NC33 5q21.1 97 703 482–97 704 002 521 1.7–1.9 RGMB (424) NC NC NC NC 
NC35 11p15.4 10 486 015–10 488 239 2225 2.4–3.3 AMPD3 (1.4) NC NC E(60%)/NC NC/E(10%) 
NC39 3q11.2 97 818 707–97 820 019 1313 1.3–1.4 ARL6 (1145) NC NC NC NC/E(5%) 
NC40 3q27.2 186 618 028–186 619 252 1225 1.3–1.4 MAP3K13 (10.2) 

NCTs that exhibit >90% sequence conservation across 17 species are denoted with an asterisk (*).

P99.5, 99.5th percentile; IGB, Integrated Genome Browser; NC, intergenic; I, intron; E, exon.

Expression analysis of conserved NCTs in normal human tissues

Non-coding transcript tissue specificity was examined by analyzing expression in various normal human tissues and cell lines using qPCR (Fig. 3A–C). Most of the NCTs were expressed in all samples, but at different levels. Some NCTs displayed restricted tissue expression. For instance, all 12 tissues examined expressed most of the 15 NCTs with the exception of NC25 (not expressed in breast), NC30* (not expressed in kidney and liver) and NC39 (not expressed in prostate, ovary, cervix, breast, kidney and liver). The most abundantly transcribed transcripts across all 12 tissues examined (ranked in descending order) were NC35, NC28* and NC31. Impressively, NC35 displayed expression levels 12-fold higher than β-actin in most tissues tested. Conversely, NC21*, NC33 and NC26* consistently displayed low expression levels in most samples examined. In general, prostate, ovary and brain exhibited the lowest levels of NCT expression. On the other hand, 9 of 15 NCTs displayed higher levels of transcription in normal tonsil, spleen and myometrium than β-actin. One possible explanation to this observation in spleen and tonsil is that both contain a substantial fraction of highly proliferative lymphocytes, thus indicating a possible role of these NCTs in immune response and/or cell proliferation. Collectively, these results demonstrate that some NCTs exhibited tissue-specific expression whereas others were ubiquitously expressed, and that there was minimal correlation between tissue expression and sequence conservation.

Figure 3.

NCT expression in 12 different normal tissue measured by qPCR. Expression values (&#x2206;Ct) are relative to β-actin, but were also compared with other housekeeping genes including GAPDH (data not shown). The expression levels of 15 NCTs in normal (A) prostate, cervix, ovary and breast, (B) brain, kidney, lung and liver, (C) tonsil, spleen, myometrium and keratinocytes. Most of the NCTs were expressed in all samples, but at different levels. Some NCTs displayed restricted tissue expression, whereas others exhibited ubiquitous expression in all normal samples tested. All NCTs displayed high levels of expression in tonsil, spleen and myometrium whereas prostate, ovary and brain exhibited the lowest levels of NCT expression. NCTs denoted with an asterisk (*) are >90% conserved.

Figure 3.

NCT expression in 12 different normal tissue measured by qPCR. Expression values (&#x2206;Ct) are relative to β-actin, but were also compared with other housekeeping genes including GAPDH (data not shown). The expression levels of 15 NCTs in normal (A) prostate, cervix, ovary and breast, (B) brain, kidney, lung and liver, (C) tonsil, spleen, myometrium and keratinocytes. Most of the NCTs were expressed in all samples, but at different levels. Some NCTs displayed restricted tissue expression, whereas others exhibited ubiquitous expression in all normal samples tested. All NCTs displayed high levels of expression in tonsil, spleen and myometrium whereas prostate, ovary and brain exhibited the lowest levels of NCT expression. NCTs denoted with an asterisk (*) are >90% conserved.

Aberrant expression of several NCTs in ovarian and breast cancer

Expression of the 15 NCTs was examined by qPCR in a panel of ovarian and breast cancer samples. The breast panel was comprised of normal cell lines, benign tissue, cancer-derived cell lines and primary breast tumors. The ovarian panel was comprised of short-term cultures of normal ovarian epithelium, ovarian cancer cell lines and primary ovarian tumors. The results revealed that more NCTs were aberrantly expressed in breast tumor samples compared with the ovarian cancer tissues (Fig. 4A and B and Table 2, Supplementary Material, Tables S1 and S2). We also found that several NCTs displayed differential expression when comparing normal and cancer samples in both breast and ovary. Interestingly, NCTs were typically upregulated in ovarian cancers in contrast to being downregulated in breast cancers. Three NCTs (NC25, NC29 and NC31) were observed to be upregulated in more than half of the 20 ovarian cancer samples examined. Although these three transcripts were altered in more samples than any of the other 12 NCTs, they also displayed a lower degree of sequence conservation (<90%). NC29, which exhibited the lowest degree of conservation (34.7%), was significantly upregulated in all 20 ovarian cancer samples tested. Thus, the changes in the transcription levels of the most highly conserved transcripts (>90% conserved) occurred at a lower rate than less-conserved NCTs, as evident by NC4*, NC5*, NC26*, NC28* and NC30*. Only NC21* (99.6%) and NC22* (96.9%) exhibited a moderate number of ovarian cancer samples with modulated expression of both NCTs (seven samples upregulated for each NCT). Conversely, the majority of NCTs were consistently downregulated in breast tumors compared with normal NCTs. Of the 17 breast cancer samples tested, at least nine samples displayed downregulated expression in 8 of the 15 NCTs, including four of the highest conserved NCTs (NC5*, NC21*, NC26* and NC30*). Thus, these data indicate that the highest conserved NCT are more susceptible to expression modulation in breast tumors compared with ovarian cancer. It is interesting to note that the most modulated NCTs in ovary had less altered expression in breast cancer and vice versa. For example, the NCTs that had the most samples with altered expression in ovary (NC25, NC29 and NC31) had the fewest samples exhibiting altered expression in breast. In contrast, NC30*, NC33 and NC39 displayed among the highest number of samples with altered expression in breast cancer, but in ovarian cancer there were very few samples with modulated expression levels for these same NCTs. This observation could indicate that some of these NCTs are specific targets in the development of either ovarian or breast cancer, but not both.

Figure 4.

Modulated expression of NCTs in cancer as measured by qPCR. (A) Number of ovarian tumor samples (n = 20 total) with altered (up- or downregulated) NCT expression. (B) Number of breast tumor samples (n = 17) with altered (up- or downregulated) NCT expression. Contrary to the ovarian cancer results, many downregulated NCTs were observed in the majority of breast tumors examined. Numerous downregulated NCTs were observed in at least half of the breast tumor samples tested. Cancer samples displaying an average expression change >4-fold (a ΔCt of more than 2 greater or less than the range observed in the normal samples) and a P-value <0.05 were considered significantly different from normal samples (Supplementary Material, Tables S1 and S2). NCTs denoted with an asterisk (*) are >90% conserved.

Figure 4.

Modulated expression of NCTs in cancer as measured by qPCR. (A) Number of ovarian tumor samples (n = 20 total) with altered (up- or downregulated) NCT expression. (B) Number of breast tumor samples (n = 17) with altered (up- or downregulated) NCT expression. Contrary to the ovarian cancer results, many downregulated NCTs were observed in the majority of breast tumors examined. Numerous downregulated NCTs were observed in at least half of the breast tumor samples tested. Cancer samples displaying an average expression change >4-fold (a ΔCt of more than 2 greater or less than the range observed in the normal samples) and a P-value <0.05 were considered significantly different from normal samples (Supplementary Material, Tables S1 and S2). NCTs denoted with an asterisk (*) are >90% conserved.

Table 2.

NCT sequence conservation, modulated expression in cancer, number of mutations (in cancer–derived cell lines and endometrial tumors) and correspondence to previously identified TARs by Bertone et al. (3)

NCT Conservation No. of breast cancer samples No. of ovarian cancer samples No. of mutations Bertone data 
  Upregulated Downregulated Upregulated Downregulated Cell lines Endometrium  
NC04* 99.94 TAR 3719, 3720 
NC05* 99.78 16 TAR 2234, 2235 
NC06 71.90 10 TAR 16446 
NC21* 99.66 13 TAR 3710,3711,3712 
NC22* 96.96 – – 
NC25 68.17 14 – 
NC26* 99.71 11 TAR 11287,11288,11289 
NC28* 99.57 – – 
NC29 34.70 20 – – 
NC30* 99.33 12 – – 
NC31 86.89 14 – – 
NC33 77.22 13 – – 
NC35 60.15 – – TAR 2356,2357,2358,2359 
NC39 63.07 13 – – TAR 9515,9516,9517 
NC40 85.00 – – TAR 10413,10414,10415 
NCT Conservation No. of breast cancer samples No. of ovarian cancer samples No. of mutations Bertone data 
  Upregulated Downregulated Upregulated Downregulated Cell lines Endometrium  
NC04* 99.94 TAR 3719, 3720 
NC05* 99.78 16 TAR 2234, 2235 
NC06 71.90 10 TAR 16446 
NC21* 99.66 13 TAR 3710,3711,3712 
NC22* 96.96 – – 
NC25 68.17 14 – 
NC26* 99.71 11 TAR 11287,11288,11289 
NC28* 99.57 – – 
NC29 34.70 20 – – 
NC30* 99.33 12 – – 
NC31 86.89 14 – – 
NC33 77.22 13 – – 
NC35 60.15 – – TAR 2356,2357,2358,2359 
NC39 63.07 13 – – TAR 9515,9516,9517 
NC40 85.00 – – TAR 10413,10414,10415 

NCTs that exhibit >90% sequence conservation across 17 species are denoted with an asterisk (*).

Sequencing analysis for mutations in NCTs

We next tested a panel of normal samples and cancer-derived cell lines from cancers of the stomach, breast, ovary, brain, pancreas, colon, esophagus and lung to determine whether any of these conserved NCTs were vulnerable in cancer development. We found relatively few potential mutations in this panel; however, there were some nucleotides within some of the NCTs that were consistently altered in distinct samples from different tissues (Table 3). This observation suggests that such potential mutations may be selected during tumorigenesis, rather than being random mutations in unstable cancer cells. Our overall results show that six NCTs have consistent alterations (possible mutations) when sequences of normal samples are compared with cancer-derived cell lines from various tissues (including breast, ovary, pancreas, colon, esophageal, lung, stomach and brain). In particular, NC5*, NC6 and NC21* displayed consistent alterations at a specific nucleotide position in at least two different cancer samples from the panel. Our results do not provide enough evidence to conclude that one cancer is more likely to be mutated; however, the cancer-derived cell lines that exhibited the most mutated NCTs were derived from the ovarian (mutations in NC5*, NC6 and NC21*) or esophageal cancers (two mutations in NC6 and NC21*). In addition, mutations in NC4* and NC26* were observed in colon samples, and mutations in NC5* and NC21* were observed in stomach cancer. As a whole, most of the mutations observed were simple substitution or addition mutations; however, we observed a significant 4 nt deletion at the same nucleotide position within NC21* in cell lines derived from four different tissues (pancreas, ovary, stomach and esophageal). There was very little correlation between mutated NCTs and any altered expression in ovary; however, mutated NC5*, NC6 and NC21* were, in fact, also susceptible to downregulated expression changes in breast cancer. Overall, we did not observe a strong correlation between highly conserved NCT and mutational events. However, we observed that four of the six NCTs (NC4*, NC5*, NC21* and NC26*) with sequence conservation >90% did have mutations.

Table 3.

Mutations found of sequenced NCT in various cancer-derived cell lines and brain xenografts

NCT Mutation location Cell line Tissue 
NC4* (274)C→CT RKO Colon 
NC5* (217)T→C SKOV3 Ovary 
 (63)C→CT GBM8 Brain 
 (63)C→CT AGS Stomach 
NC6 (68)G→GC OVCAR5 Ovary 
 (68)G→C SU8686 Pancreas 
 (68)G→C KYSE140 Esophageal 
 (68)G→C KYSE410 Esophageal 
NC21* (318_321)delATGA BXPC3 Pancreas 
 (319-322)delTGAA SKOV3 Ovary 
 (319-322)delTGAA AGS Stomach 
 (319-322)delTGAA OE33 Esophageal 
NC22* None – – 
NC25 (67)T→TC MDA468 Breast 
NC26* (329)G→GT RKO Colon 
NC28* None – – 
NC29 None – – 
NC30* None – – 
NC31 None – – 
NC33 None – – 
NC35 Did not sequence – – 
NC39 Did not sequence – – 
NC40 Did not sequence – – 
NCT Mutation location Cell line Tissue 
NC4* (274)C→CT RKO Colon 
NC5* (217)T→C SKOV3 Ovary 
 (63)C→CT GBM8 Brain 
 (63)C→CT AGS Stomach 
NC6 (68)G→GC OVCAR5 Ovary 
 (68)G→C SU8686 Pancreas 
 (68)G→C KYSE140 Esophageal 
 (68)G→C KYSE410 Esophageal 
NC21* (318_321)delATGA BXPC3 Pancreas 
 (319-322)delTGAA SKOV3 Ovary 
 (319-322)delTGAA AGS Stomach 
 (319-322)delTGAA OE33 Esophageal 
NC22* None – – 
NC25 (67)T→TC MDA468 Breast 
NC26* (329)G→GT RKO Colon 
NC28* None – – 
NC29 None – – 
NC30* None – – 
NC31 None – – 
NC33 None – – 
NC35 Did not sequence – – 
NC39 Did not sequence – – 
NC40 Did not sequence – – 

NCTs that exhibit greater than 90% sequence conservation across 17 species are denoted with an asterisk (*).

It is possible that the sequence changes presented above may be polymorphic in nature. Therefore, altered regions were aligned and compared against all known SNPs from human genome databases to increase the probability that the changes we observed are real mutations. In addition, potential mutations observed in cancer-derived cell lines were cross checked against any similar alterations observed within 11 normal samples and found it to be certain that these changes were not polymorphic within our normal sample panel. Nonetheless, the possibility still exists that the changes we observed are rare polymorphism within a specific cell type. In attempt to address this issue, NCTs that had the most potential mutations in cancer-derived cell lines were further sequenced in a panel of 48 matched endometrial samples (i.e. cancer versus normal endometrial tissue from the same patient). No mutations of NC5*, NC6 or NC21* (the three NCTs that had the most significant alterations in cancer-derived cell lines) were observed in endometrial cancer (Table 4). However, NCT25, which was only mutated in one of the cancer-derived cell lines, was found to be frequently mutated in the endometrial cancer panel (in 23 of 48 of the samples tested). These samples had matched normal DNA (from blood), and thus the observed alterations are mutations and not polymorphisms. In addition, each of the four distinct mutations observed was found in multiple samples. Since there was only a single possible mutation in the cancer-derived cell line panel, NC25 could be a mutational target specifically in endometrial cancer.

Table 4.

A total of 23 different endometrial tumor specimens were observed having mutations in NC25

Sample pair Position and mutation 
1/2 (251)A→AG 
1/2 (373)G→GA 
15/16 (250)A→AG 
17/18 (373)G→GA 
23/24 (251)A→AG 
29/30 (250)A→AG 
31/32 (251)A→AG 
31/32 (373)GA→G 
31/32 (401)A→AT 
35/36 (382)CT→C 
41/42 (249)A→AG 
43/44 (249)A→AG 
43/44 (371)GA→G 
45/46 (248)A→AG 
47/48 (384)CT→C 
55/56 (369)GA→G 
57/58 (249)A→G 
57/58 (399)A→T 
59/60 (252)AG→A 
59/60 (402)AT→A 
67/68 (370)GA→G 
69/70 (252)A→G 
69/70 (402)A→T 
71/72 (252)AG→A 
71/72 (402)AT→A 
73/74 (252)A→AG 
79/80 (248)G→A 
79/80 (398)T→A 
83/84 (252)AG→A 
83/84 (402)AT→A 
85/86 (252)A→AG 
91/92 (251)G→A 
91/92 (401)T→A 
95/96 (250)AG→A 
95/96 (400)AT→A 
Sample pair Position and mutation 
1/2 (251)A→AG 
1/2 (373)G→GA 
15/16 (250)A→AG 
17/18 (373)G→GA 
23/24 (251)A→AG 
29/30 (250)A→AG 
31/32 (251)A→AG 
31/32 (373)GA→G 
31/32 (401)A→AT 
35/36 (382)CT→C 
41/42 (249)A→AG 
43/44 (249)A→AG 
43/44 (371)GA→G 
45/46 (248)A→AG 
47/48 (384)CT→C 
55/56 (369)GA→G 
57/58 (249)A→G 
57/58 (399)A→T 
59/60 (252)AG→A 
59/60 (402)AT→A 
67/68 (370)GA→G 
69/70 (252)A→G 
69/70 (402)A→T 
71/72 (252)AG→A 
71/72 (402)AT→A 
73/74 (252)A→AG 
79/80 (248)G→A 
79/80 (398)T→A 
83/84 (252)AG→A 
83/84 (402)AT→A 
85/86 (252)A→AG 
91/92 (251)G→A 
91/92 (401)T→A 
95/96 (250)AG→A 
95/96 (400)AT→A 

DISCUSSION

Since the completion of the human genome project, many high-throughput strategies have been developed to screen and analyze the human genome. One such technology is tiling arrays with probes for determining the transcriptional activity of the entire non-repetitive portion of the genome. In this study, a 35 nt probe tiling array was used to scan NHBE cells to find novel, conserved, highly transcriptionally active non-coding regions with potential cellular function and involvement in disease development. The list of ∼54 000 most highly expressed TARs was subsequently narrowed to 15 candidate regions that were then analyzed in detail. Many conserved, abundantly expressed NCTs were identified, but potentially many more would have been discovered had we not applied a stringent filter to capture only those NCTs expressed at extremely high levels. Some of the observed non-coding TARs could originate from the antisense strand, because our protocol converts RNA into double-stranded cDNA, thus the strandedness cannot be determined from these data. Further experimentation (e.g. 3′ and 5′ RACE) is necessary to clarify their precise location as well as the exact size of each transcript, and other experiments are also necessary to determine which strand encodes each of these NCTs.

One unique aspect of this study is our focus on only the most highly transcribed non-coding sequences spanning regions of consecutive tilling probes across at least 400 nts. In addition to this criterion, we also specifically chose to examine those regions that were moderately to highly conserved across 17 species, rather than comparing sequences with two or three closely related species. Seven of the 15 NCTs analyzed in this study represent newly discovered, conserved and highly transcribed sequences derived from NHBE cells. The other eight NCTs have been identified previously using a different tiling array platform and model system (Table 2) (3). However, they have not been characterized further until now. A similar study also demonstrated that a group of ultraconserved ncRNAs are altered in cancer (28). However, that study did not utilize a tiling array to discover these transcripts. In addition, the expression of many of those transcripts is significantly lower than the transcripts described in this report. Our study provides additional evidence that ncRNAs play a role in cancer development and shows the alteration of NCT expression in breast and ovarian cancer. Another unique aspect of this study is our demonstration of potential consistent mutations in a panel of cancer-derived cell lines. Unfortunately, we did not have matched normal DNA for any of these cell lines, thus we cannot definitively say that these are not rare polymorphisms. However, when we analyzed a panel of 48 primary endometrial cancers with matched normal DNA, we found that one of the NCTs was consistently mutated in almost half of the endometrial cancers examined. The observation of consistent mutations suggests that these are not random mutations in unstable cancer cells, but could have an important functional role. Thus, this provides further evidence that these conserved NCTs, in addition to having altered expression in cancers, are also mutational targets.

Most studies in cancer genetics describe mutations or other alterations (e.g. overexpression, or downregulation) in protein-coding genes. The recent identification of new classes of ncRNA species implicated in important steps of cancer reinforces the role of these transcripts in the process of tumorigenesis (14,15). We report that the expression of most NCTs is altered in ovarian and breast cancer samples. This indicates that some of the 15 NCTs examined could be involved in the development of cancer in these tissues. Recent publications support a similar conclusion in other cancers (14,19,20,24,29,30). Our study also demonstrates that even the most conserved, long ncRNAs could be crucial targets for mutation. Such an idea is somewhat counterintuitive because one might assume ultraconserved sequences should be highly protected from mutation. Our results suggest that all non-coding sequences, whether they are conserved or not, can be susceptible to mutagenesis. However, one study demonstrated that extreme sequence conservation does not necessarily reflect crucial functions required for viability (31). Nevertheless, the fact that we observed some consistent mutations is supportive of the concept that these are not just random mutations in unstable cancer cells, but that specific alterations could play a functional role in cancer development.

Since the function of most ncRNAs is currently unknown, it is difficult to speculate whether they exhibit tissue-specific roles. Several NCTs displayed expression in all tissues (although at different levels), but others had more restricted expression. In addition, the fact that all NCTs displayed high levels of transcription in normal tonsil and spleen is possibly because of a substantial fraction of highly proliferative lymphocytes in both tissues, indicating a possible role in immune response and/or cell proliferation. This idea is supported by a recent study that observed a high number of transcribed, ultraconserved regions in B-cells (28). To support further a possible connection between long NCTs involvement in immune response, another group investigated the expression patterns of a 17 kb NCT, NTT, and their findings indicate that the function of NNT is specifically induced upon T-cell activation (32). In contrast, non-proliferative tissues such as brain, liver and kidney had very low expression of almost all of these long, conserved NCTs. This result could indicate that these NCTs have a very limited role in these tissues. The very low NCT expression levels observed in brain samples do contradict recent reports, suggesting that conserved NCTs are associated with genes involved in brain function (33,34). Nevertheless, the results suggest a correlation between NCT expression and degree of cell proliferation. Thus, this new group of NCTs could have a regulatory role in cellular proliferation and/or immune response.

Although most known long ncRNAs do not show evolutionary conservation (35), we present a group of transcriptionally active NCTs that are, in fact, conserved. The entire regions encoding six of the NCTs (NC4*, NC5*, NC21*, NC26*, NC28* and NC31*) were >99% conserved across 17 species, which could classify them as ultraconserved sequences [previously defined as segments >200 nt long and having 100% identity among orthologous regions of the human, rat and mouse genomes (36)]. However, the study where this definition was proposed did not examine whether or not such segments were transcribed. As much as ten times more of the conserved fraction of the genome is non-coding than coding by some estimates (26,37,38). DNA sequences that have functional importance are often conserved among species because of negative selection (26,33,39,40). Thus, sequence comparison between divergent species can be a useful method for identifying functional candidate sequences in the non-coding, non-repetitive regions of the genome. It is speculated that novel, functional ncRNAs should have high interspecies conservation patterns similar to those of known functional ncRNAs, such as miRNAs, piRNAs and snoRNAs (10,16). For example, the high conservation of known miRNAs is presumably because their sequence is constrained by functional interaction with multiple targets (10,41,42). However, many functionally important ncRNAs are rapidly evolving (34,43–45). Thus, the degree of conservation should not be the sole screening determinant of potential function.

The function of these NCTs is currently unknown. Their abundance, tremendous sequence conservation and aberrant expression in many ovarian and breast cancers suggest that they not only play an important role in normal cellular growth and immune response, but also in the development of cancer. Consistent potential mutations at specific nucleotides within some of these NCTs also suggest that these alterations are not random, but have an important functional role in cancer development. Taken together, there is a possibility that some of these ncRNAs could be used as targets for drug development and early diagnosis and prognosis. Long ncRNA species have the potential to be as important as protein-coding genes in cancer biology. Future studies will attempt to determine the mechanistic role of long NCTs in normal cellular function and disease, and how they might interact with protein-coding genes within crucial signaling pathways.

MATERIALS AND METHODS

Cell culture

Cryopreserved NHBE cells were purchased from Cambrex Bio Science Walkerville, Inc. (Walkerville, MD, USA) and grown in 5% CO2 at 37°C in defined bronchial epithelial cell basal medium (Cambrex) containing bovine pituitary extract, human epidermal growth factor, insulin, hydrocortisone, transferrin, epinephrine, triiodothyroneine, retinoic acid (to inhibit cell differentiation) and antibiotic, GA-1000, according to manufacturer's instructions.

Normal tissues, cancer-derived cell lines and primary tumors

Liver, kidney, breast, myometrium, endometrium, prostate, spleen and tonsil tissues were obtained from the Mayo Pathology Department (Rochester, MN, USA) and were determined to be normal or benign by a pathologist. The brain tissue was purchased from Ambion, Inc. (Austin, TX, USA). NHBE cells (Cambrex) were used and considered as normal lung. The cancerous ovarian samples all corresponded to serous ovarian cancers, either ovarian cancer-derived cell lines or primary tumors. As a control, NCT expression in cancerous ovarian tissues was compared with NCT expression in short-term cultures of normal ovarian epithelial cells. For the breast samples, expression in breast cancer and normal breast tissues were compared. Each cancer-derived cell line was grown using conditions recommended by their supplier. For NCT cancer expression studies, 20 primary serous ovarian tumors and 17 breast cancer primary tumors were examined. We also generated a panel of 48 primary endometrial tumors and matched blood samples for sequencing experiments. All of these were endometrioid endometrial cancers. Each primary tumor sample had slides produced for H&E staining that were analyzed by qualified pathologists. Cancerous tissues with >80% tumor cells were used for RNA extraction. Total RNA was extracted using Gentra Systems Versagene Total RNA tissue kit and DNase kit (Minneapolis, MN, USA).

Whole-genome tiling array design

GeneChip® Human 35 bp Tiling Array 1.0R Set (Affymetrix) design is based on the human genome version 34 according to NCBI versioning system, as downloaded from the UCSC website (http://www.genome.ucsc.edu/) and repeat masked. Probes are roughly at a 35 bp resolution (center-to-center of each consecutive 25mer), subject only to requirements of synthesis and probe quality and subdivision according to their genome position into 14 separate microarray designs of the same overall and feature size. However, the oligonucleotides could be spaced at larger intervals when the tiled array approaches a repeat. A probe pair is formed by a 25mer, called a perfect match, identical to the genome sequence at the selected position, and another one called a mismatch that differs from the first in the central base. The entire tiling array design contained a total of 41 804 804 probe pairs representing 1 364 427 919 (91%) nucleotides of the repeat masked (http://www.repeatmasker.org/) sequence that could be grouped into windows containing at least five probes.

Tiling array experimental procedure

The experimental procedure for the tiling array project was conducted in conjunction with the Microarray Core of the Mayo Clinic. Briefly, total RNA was isolated from NHBE cultured in T-150 flasks in triplicate using Gentra Systems Versagene Total RNA Cell kit, including their DNase kit procedure. To focus on larger transcripts, the total RNA was isolated using a column with low affinity for RNAs of <80 nt, thus eliminating many small RNA species and most of the known mature miRNA species, but not their precursors. RNA quality was assessed via NanoDrop and performing Agilent traces on each sample.

The Microarray Core Laboratory used the GeneChip® WT Double-Stranded DNA Terminal Labeling Kit and GeneChip® WT Amplified Double-Stranded cDNA Synthesis Kit from Affymetrix (Santa Clara, CA, USA). This protocol entails first- and second-strand cDNA synthesis using random hexamers, RNA removal and cDNA purification, a quality control cDNA step, a cDNA fragmentation step, TdT labeling, prehybridization of the chips and the final hybridization of the labeled cDNA onto the GeneChip® Human 35 bp Tiling Array 1.0R set. Labeled samples were re-hybridized up to a total of four chips, as suggested by the protocol. Each tiling array chip was completed in triplicate (i.e. three complete 14 chip sets were hybridized and scanned). Although this tiling array platform does specify from which strand the signals arise, the RNA had been converted into double-stranded cDNA so that it is impossible to distinguish whether signal detected transcription from the sense or antisense strand.

Each array was scanned using the Affymetrix GeneChip® 300 G7 scanner. The GeneChip® operating software automatically generated four files required for data analysis, including the CEL file. Affymetrix provided two software programs for initial data analysis, Tiling Analysis Software (TAS) and Integrated Genome Browser (IGB). TAS v1.1 user guide (Affymetrix) provides detailed description of its analytical capabilities including various uses of quality control measures. Briefly, the analyses provided within the TAS application included analyzing feature intensity data stored in CEL files to produce signal and P-values for each genomic position, computation of genomic intervals based on those computed signal and P-values and computation of summary statistics and visualizations for assessing the quality of the array data. The results of this analysis were imported into applications such as IGB or the UCSC Human Genome Browser. With IGB, annotation variations were compared in different datasets. IGB combined in one viewer its own experimental or computational results, common reference information and access to public and private data banks. More detailed description of IGB's capabilities is documented in IGB user's guide (Affymetrix).

Non-coding region selection and conservation analysis

In this study, we defined a transcript [termed a ‘transfrag’ or ‘transcriptionally active region (TAR)’ by others (3,4)] as a region of nucleotide sequence from which a string of consecutive probes display significant signal intensity from the tiling array, and such signals can thus be interpreted as transcriptional activity. The region of consecutive hybridizing probe does not necessarily equate with the size of a transcript. The signal could represent multiple RNAs that share a common genomic sequence. Furthermore, it is possible that one region could code for multiple different RNAs.

During tiling array data analysis, signal intensity threshold levels were set to identify only the most highly expressed TARs in NHBE cells >400 nt long. TARs were considered only if they expressed in the 99.5th signal intensity percentile. TAR number, size, origin and chromosomal location [probe coordinate positions from version 34 of the human genome were taken directly from the tiling arrays and were converted to the most current edition of the human genome, version 36 (Table 2)] of all intensities within this percentile were documented. Of these, we selected only the non-protein coding (intergenic and intronic) regions contained among the highest expressed probes from the tiling array. All non-coding regions were then examined further using the UCSC Human Genome Browser (http://www.genome.ucsc.edu/) for visualization against genomic annotations [including known genes, refseq, Ensembl (version 43) predicted genes], human messenger RNA (mRNA) sequence alignment, spliced and human ESTs, superfamily, EvoFold, sno/miRNA, poly(A), CpG islands, dbSNP (build 126) density, G–C content (gc5Base) and repetitive elements. The absence of poly(A) regions within and around each NCT further verifies that their sequence is not associated with mRNA. To determine sequence conservation, NCT sequences were aligned among 17 vertebrate species (armadillo, dog, chicken, chimp, cow, elephant, frog, Fugu, human, macaque, mouse, opossum, rabbit, rat, tenrec, Tetraodon and zebrafish) based on a phylogenetic hidden Markov model, phastCons and Multiz alignments (see http://www.genome.ucsc.edu/ for more details). We also used Mulan to double check evolutionary conservation across multiple species (http://www.mulan.dcode.org/). A subset of 15 NCTs were chosen to characterize further in subsequent northern blot, real-time RT–PCR and sequencing experiments.

Northern blotting

Northern blotting were performed to obtain a rough estimates of transcript sizes and to validate expression in RNA. Total RNAs were resolved under denaturing conditions on a 2% agarose gel containing 5% formaldehyde. RNA was transferred onto a Trans-Blot Transfer Medium Pure Nitrocellulose Membrane (Bio-Rad) overnight and then fixed to the membrane via UV-crosslinker (Stratalinker, Stragagene). Probes were designed based on the NCT sequences taken directly from IGB (to make certain sequences correspond to tiling array signals). All primers used to create probes were designed with the Primer 3 program (http://www.biotools.umassmed.edu/bioapps/primer3_www.cgi). PCR products created from these primers were used as probes that spanned a large region of each NCT. Probes were labeled with 17 pmol [α-32P] dNTP (GE Heathcare, Piscataway, USA) using Megaprime DNA Labelling Systems kit (Amersham Biosciences, Piscataway, USA) according to the manufacturer's instructions. The hybridization was carried out for 16 h in a hybridization solution containing 50% formamide, 5× SSPE, 2× Denhardt's reagent, 0.1% SDS and 100 µg/ml salmon sperm. Membranes were then exposed for 1–2 days on BioMax MR Scientific Imagining Film (KODAK, Rochester, NY, USA) at −80°C and then developed on a XOMAT machine. RNA Century Plus Marker (Ambion, Austin, Tx, USA) was used to distinguish the size of resolved bands. The 0.5 and 1.0 kb bands from this marker, in addition to the 18S ribosomal RNA band (∼1.9 kb), were used to determine a rough estimate of transcript size.

Real-time RT–PCR

RNA (1–2 µg total RNA from each sample) was primed with random-hexamers in a volume of 20 µl and reverse transcribed into cDNA with cDNA synthesis kits (Invitrogen, Carlsbad, CA, USA). After cDNA synthesis, the volume of each sample was increased to 200 µl with water. Of this diluted cDNA, 3 µl was utilized in each real-time RT–PCR reaction.

To determine the NCT expression, two pairs of PCR primers were created for each region using the Primer 3 program (http://www.biotools.umassmed.edu/bioapps/primer3_www.cgi). Primers were optimized for real-time RT–PCR with β-actin as a control gene and then with the transcript region of interest. Real-time RT–PCR was conducted using an ABI 7900HT Fast Real-time PCR system. When the optimal concentration of primers produced a linear curve relating to the input concentration of cDNA, tissue samples were run in triplicate for each tested transcript. To normalize the expression levels (ΔCt), the threshold cycle (Ct) for each transcript was subtracted from the Ct of the more abundantly expressed control gene (β-actin). Several other housekeeping genes (B2M, GAPD, GUSB, HPRT1, PGK, PPIA, RPL13A, TBP and TFRC) were also used to verify and confirm ΔCt results from β-actin. For normal tissue expression, cDNA obtained from three different short-term cultures of normal ovarian surface epithelial (VOSE cultures) and four normal breast samples were utilized, respectively. The expression range of an NCT in the normal samples was compared with that in the primary tumors. Any change greater than 4-fold (a ΔCt of more than 2 greater or less than the range observed in the normal samples) was considered for statistical analysis.

Mutation analysis

To examine whether NCTs could be potential targets for mutation, 12 of the 15 NCTs were sequenced across a panel of 42 cancer-derived cell lines from various tissues and then we compared their sequences to 11 sequences of normal samples. NCTs that were observed having possible mutations in the cancer-derived cell line panel were subsequently examined in a panel of 48 primary endometrial tumors and matched blood samples. All of these were endometrioid endometrial cancers. Unfortunately, we did not have matched blood samples for either the primary ovarian or breast tumor samples studied. In addition, we did not have corresponding normal samples for any of the cancer-derived cell lines used in this study. To determine whether NCTs are mutational targets, we first designed primers to flank the entire region of interest. In the case of larger NCTs, sequence was subdivided into smaller overlapping regions in order to obtain a manageable sequencing size. PCR experimentation was conducted to create an amplicon for each NCT in a panel of 54 genomic DNA samples comprised 42 cancer-derived cell lines, 11 normal blood samples and one water negative control (Supplementary Material, Table S3). Genomic DNA was isolated with Puregene™ DNA Purification System (Gentra). HotStar reagents from Qiagen were used in PCR reactions. Samples preparation for sequencing involved three steps: (i) 5 µl of PCR product was mixed with 1 µl of 1:10 dilution of Exonuclease I and incubated at 37°C for 15 min, 80°C for 15 min and cooled on ice; (ii) 2 µl of 1:1 dilution of Shrimp alkaline phosphate with 10× buffer was added to the previous reaction mixture and incubated at 37°C for 15 min, 80°C for 15 min and cooled on ice and (iii) finally, 1 µl of primer was added to this and samples were sent to the DNA Sequencing Laboratory at Mayo Clinic to be sequenced on ABI 3730XL sequencers. Mutational analysis was conducted using Mutation Surveyor version 3.01 (SoftGenetics). Once a possible mutation was observed, sequencing was conducted with the primer from the opposite end of the region to verify the mutation. Next, nested primers were used to ensure the mutations were valid. To ensure that these mutations were not natural polymorphisms, mutation regions were aligned and compared against all known SNPs from human genome databases (http://www.genome.ucsc.edu and http://www.ncbi.nlm.nih.gov). In addition, we had 11 normal samples in this panel to examine any natural polymorphic changes. Any sequence change that occurred in cancer-derived cell lines that corresponded to polymorphisms observed in the normal samples was not considered to be a mutation. Because so few of the ovarian and breast cancer cell lines (which are generally produced from more advanced stage cancers) had mutations, we did not sequence the primary breast and ovarian cancers examined in qPCR experiments. In addition, we did not have matched normal DNAs corresponding to these samples.

Statistical analysis

Only NCTs displaying altered expression levels of at least 4-fold, consistently in each of the triplicate qPCR experiments, were considered for statistical analysis when comparing normal versus cancer samples from breast and ovary. Dunnett's one-way analysis of variance (ANOVA) was the method used to compare the difference of expression (46). The expression in cancer samples was considered significantly different if P-value is ≤0.05, compared with the expression of normal samples.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG Online.

FUNDING

We also acknowledge the following grants: DOD DAMD 17-00-1-0296 (to D.I.S.) and The Mayo Cancer Genetic Epidemiology Training Program (R25-CA-92049-03).

ACKNOWLEDGEMENTS

We thank members of the Breast Cancer Program and Kimberly Kalli of the Ovarian Cancer Program of the Mayo Clinic Cancer Center.

Conflict of Interest statement. None declared.

REFERENCES

1
Carninci
P.
Kasukawa
T.
Katayama
S.
Gough
J.
Frith
M.C.
Maeda
N.
Oyama
R.
Ravasi
T.
Lenhard
B.
Wells
C.
, et al.  . 
The transcriptional landscape of the mammalian genome
Science
 , 
2005
, vol. 
309
 (pg. 
1559
-
1563
)
2
Kapranov
P.
Cawley
S.E.
Drenkow
J.
Bekiranov
S.
Strausberg
R.L.
Fodor
S.P.
Gingeras
T.R.
Large-scale transcriptional activity in chromosomes 21 and 22
Science
 , 
2002
, vol. 
296
 (pg. 
916
-
919
)
3
Bertone
P.
Stolc
V.
Royce
T.E.
Rozowsky
J.S.
Urban
A.E.
Zhu
X.
Rinn
J.L.
Tongprasit
W.
Samanta
M.
Weissman
S.
, et al.  . 
Global identification of human transcribed sequences with genome tiling arrays
Science
 , 
2004
, vol. 
306
 (pg. 
2242
-
2246
)
4
Kampa
D.
Cheng
J.
Kapranov
P.
Yamanaka
M.
Brubaker
S.
Cawley
S.
Drenkow
J.
Piccolboni
A.
Bekiranov
S.
Helt
G.
, et al.  . 
Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22
Genome Res.
 , 
2004
, vol. 
14
 (pg. 
331
-
342
)
5
Cheng
J.
Kapranov
P.
Drenkow
J.
Dike
S.
Brubaker
S.
Patel
S.
Long
J.
Stern
D.
Tammana
H.
Helt
G.
, et al.  . 
Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution
Science
 , 
2005
, vol. 
308
 (pg. 
1149
-
1154
)
6
Johnson
J.M.
Edwards
S.
Shoemaker
D.
Schadt
E.E.
Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments
Trends Genet.
 , 
2005
, vol. 
21
 (pg. 
93
-
102
)
7
Eddy
S.R.
Non-coding RNA genes and the modern RNA world
Nat Rev Genet
 , 
2001
, vol. 
2
 (pg. 
919
-
929
)
8
Mattick
J.S.
RNA regulation: a new genetics?
Nat. Rev. Genet.
 , 
2004
, vol. 
5
 (pg. 
316
-
323
)
9
Kapranov
P.
Cheng
J.
Dike
S.
Nix
D.A.
Duttagupta
R.
Willingham
A.T.
Stadler
P.F.
Hertel
J.
Hackermuller
J.
Hofacker
I.L.
, et al.  . 
RNA maps reveal new RNA classes and a possible function for pervasive transcription
Science
 , 
2007
, vol. 
316
 (pg. 
1484
-
1488
)
10
Mattick
J.S.
Makunin
I.V.
Small regulatory RNAs in mammals
Hum. Mol. Genet.
 , 
2005
, vol. 
14
 (pg. 
R121
-
R132
)
11
Huttenhofer
A.
Schattner
P.
Polacek
N.
Non-coding RNAs: hope or hype?
Trends Genet.
 , 
2005
, vol. 
21
 (pg. 
289
-
297
)
12
Mattick
J.S.
Makunin
I.V.
Non-coding RNA
Hum. Mol. Genet.
 , 
2006
, vol. 
15
 (pg. 
R17
-
R29
)
13
Pollard
K.S.
Salama
S.R.
Lambert
N.
Lambot
M.A.
Coppens
S.
Pedersen
J.S.
Katzman
S.
King
B.
Onodera
C.
Siepel
A.
, et al.  . 
An RNA gene expressed during cortical development evolved rapidly in humans
Nature
 , 
2006
, vol. 
443
 (pg. 
167
-
172
)
14
Costa
F.F.
Non-coding RNAs: new players in eukaryotic biology
Gene
 , 
2005
, vol. 
357
 (pg. 
83
-
94
)
15
Calin
G.A.
Croce
C.M.
MicroRNA signatures in human cancers
Nat. Rev. Cancer
 , 
2006
, vol. 
6
 (pg. 
857
-
866
)
16
Lau
N.C.
Seto
A.G.
Kim
J.
Kuramochi-Miyagawa
S.
Nakano
T.
Bartel
D.P.
Kingston
R.E.
Characterization of the piRNA complex from rat testes
Science
 , 
2006
, vol. 
313
 (pg. 
363
-
367
)
17
Grivna
S.T.
Pyhtila
B.
Lin
H.
MIWI associates with translational machinery and PIWI-interacting RNAs (piRNAs) in regulating spermatogenesis
Proc. Natl. Acad. Sci. USA
 , 
2006
, vol. 
103
 (pg. 
13415
-
13420
)
18
Akhtar
A.
Dosage compensation: an intertwined world of RNA and chromatin remodelling
Curr. Opin. Genet. Dev.
 , 
2003
, vol. 
13
 (pg. 
161
-
169
)
19
Chen
W.
Bocker
W.
Brosius
J.
Tiedge
H.
Expression of neural BC200 RNA in human tumours
J. Pathol.
 , 
1997
, vol. 
183
 (pg. 
345
-
351
)
20
Chen
W.
Heierhorst
J.
Brosius
J.
Tiedge
H.
Expression of neural BC1 RNA: induction in murine tumours
Eur. J. Cancer
 , 
1997
, vol. 
33
 (pg. 
288
-
292
)
21
Wurdinger
T.
Costa
F.F.
Molecular therapy in the microRNA era
Pharmacogenomics J.
 , 
2007
, vol. 
7
 (pg. 
297
-
304
)
22
Furuno
M.
Pang
K.C.
Ninomiya
N.
Fukuda
S.
Frith
M.C.
Bult
C.
Kai
C.
Kawai
J.
Carninci
P.
Hayashizaki
Y.
, et al.  . 
Clusters of internally primed transcripts reveal novel long noncoding RNAs
PLoS Genet.
 , 
2006
, vol. 
2
 pg. 
e37
 
23
Eddy
S.R.
Computational genomics of noncoding RNA genes
Cell
 , 
2002
, vol. 
109
 (pg. 
137
-
140
)
24
Reis
E.M.
Nakaya
H.I.
Louro
R.
Canavez
F.C.
Flatschart
A.V.
Almeida
G.T.
Egidio
C.M.
Paquola
A.C.
Machado
A.A.
Festa
F.
, et al.  . 
Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer
Oncogene
 , 
2004
, vol. 
23
 (pg. 
6684
-
6692
)
25
Ji
P.
Diederichs
S.
Wang
W.
Boing
S.
Metzger
R.
Schneider
P.M.
Tidow
N.
Brandt
B.
Buerger
H.
Bulk
E.
, et al.  . 
MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer
Oncogene
 , 
2003
, vol. 
22
 (pg. 
8031
-
8041
)
26
Smith
N.G.
Brandstrom
M.
Ellegren
H.
Evidence for turnover of functional noncoding DNA in mammalian genome evolution
Genomics
 , 
2004
, vol. 
84
 (pg. 
806
-
813
)
27
Snyder
M.
Gerstein
M.
Genomics. Defining genes in the genomics era
Science
 , 
2003
, vol. 
300
 (pg. 
258
-
260
)
28
Calin
G.A.
Liu
C.G.
Ferracin
M.
Hyslop
T.
Spizzo
R.
Sevignani
C.
Fabbri
M.
Cimmino
A.
Lee
E.J.
Wojcik
S.E.
, et al.  . 
Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas
Cancer Cell
 , 
2007
, vol. 
12
 (pg. 
215
-
229
)
29
Lin
R.
Maeda
S.
Liu
C.
Karin
M.
Edgington
T.S.
A large noncoding RNA is a marker for murine hepatocellular carcinomas and a spectrum of human carcinomas
Oncogene
 , 
2007
, vol. 
26
 (pg. 
851
-
858
)
30
Reis
E.M.
Ojopi
E.P.
Alberto
F.L.
Rahal
P.
Tsukumo
F.
Mancini
U.M.
Guimaraes
G.S.
Thompson
G.M.
Camacho
C.
Miracca
E.
, et al.  . 
Large-scale transcriptome analyses reveal new genetic marker candidates of head, neck, and thyroid cancer
Cancer Res.
 , 
2005
, vol. 
65
 (pg. 
1693
-
1699
)
31
Ahituv
N.
Zhu
Y.
Visel
A.
Holt
A.
Afzal
V.
Pennacchio
L.A.
Rubin
E.M.
Deletion of ultraconserved elements yields viable mice
PLoS Biol.
 , 
2007
, vol. 
5
 pg. 
e234
 
32
Liu
A.Y.
Torchia
B.S.
Migeon
B.R.
Siliciano
R.F.
The human NTT gene: identification of a novel 17-kb noncoding nuclear RNA expressed in activated CD4+ T cells
Genomics
 , 
1997
, vol. 
39
 (pg. 
171
-
184
)
33
Pennacchio
L.A.
Ahituv
N.
Moses
A.M.
Prabhakar
S.
Nobrega
M.A.
Shoukry
M.
Minovitsky
S.
Dubchak
I.
Holt
A.
Lewis
K.D.
, et al.  . 
In vivo enhancer analysis of human conserved non-coding sequences
Nature
 , 
2006
, vol. 
444
 (pg. 
499
-
502
)
34
Prabhakar
S.
Noonan
J.P.
Paabo
S.
Rubin
E.M.
Accelerated evolution of conserved noncoding sequences in humans
Science
 , 
2006
, vol. 
314
 pg. 
786
 
35
Yazgan
O.
Krebs
J.E.
Noncoding but nonexpendable: transcriptional regulation by large noncoding RNA in eukaryotes
Biochem. Cell Biol.
 , 
2007
, vol. 
85
 (pg. 
484
-
496
)
36
Bejerano
G.
Pheasant
M.
Makunin
I.
Stephen
S.
Kent
W.J.
Mattick
J.S.
Haussler
D.
Ultraconserved elements in the human genome
Science
 , 
2004
, vol. 
304
 (pg. 
1321
-
1325
)
37
Lindblad-Toh
K.
Wade
C.M.
Mikkelsen
T.S.
Karlsson
E.K.
Jaffe
D.B.
Kamal
M.
Clamp
M.
Chang
J.L.
Kulbokas
E.J.
III
Zody
M.C.
, et al.  . 
Genome sequence, comparative analysis and haplotype structure of the domestic dog
Nature
 , 
2005
, vol. 
438
 (pg. 
803
-
819
)
38
Siepel
A.
Bejerano
G.
Pedersen
J.S.
Hinrichs
A.S.
Hou
M.
Rosenbloom
K.
Clawson
H.
Spieth
J.
Hillier
L.W.
Richards
S.
, et al.  . 
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
Genome Res.
 , 
2005
, vol. 
15
 (pg. 
1034
-
1050
)
39
Sironi
M.
Menozzi
G.
Comi
G.P.
Cagliani
R.
Bresolin
N.
Pozzoli
U.
Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences
Hum. Mol. Genet.
 , 
2005
, vol. 
14
 (pg. 
2533
-
2546
)
40
Pollard
K.S.
Salama
S.R.
King
B.
Kern
A.D.
Dreszer
T.
Katzman
S.
Siepel
A.
Pedersen
J.S.
Bejerano
G.
Baertsch
R.
, et al.  . 
Forces shaping the fastest evolving regions in the human genome
PLoS Genet.
 , 
2006
, vol. 
2
 pg. 
e168
 
41
Lagos-Quintana
M.
Rauhut
R.
Meyer
J.
Borkhardt
A.
Tuschl
T.
New microRNAs from mouse and human
RNA
 , 
2003
, vol. 
9
 (pg. 
175
-
179
)
42
Berezikov
E.
Guryev
V.
van de Belt
J.
Wienholds
E.
Plasterk
R.H.
Cuppen
E.
Phylogenetic shadowing and computational identification of human microRNA genes
Cell
 , 
2005
, vol. 
120
 (pg. 
21
-
24
)
43
Babak
T.
Blencowe
B.J.
Hughes
T.R.
A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription
BMC Genomics
 , 
2005
, vol. 
6
 pg. 
104
 
44
Pang
K.C.
Frith
M.C.
Mattick
J.S.
Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function
Trends Genet.
 , 
2006
, vol. 
22
 (pg. 
1
-
5
)
45
Willingham
A.T.
Gingeras
T.R.
TUF love for ‘junk’ DNA
Cell
 , 
2006
, vol. 
125
 (pg. 
1215
-
1220
)
46
Tamhane
A.C.
Dunlop
D.D.
Multiple comparisons of means
Statistics and Data Analysis from Elementary to Intermediate
 , 
2000
Upper Saddle River, NJ
Prentice-Hall, Inc.
(pg. 
475
-
476
)