Traditional approaches to sequence analysis are widely used to guide therapy for patients with lung and colorectal cancer and for patients with melanoma, sarcomas (eg, gastrointestinal stromal tumor), and subtypes of leukemia and lymphoma. The next-generation sequencing (NGS) approach holds a number of potential advantages over traditional methods, including the ability to fully sequence large numbers of genes (hundreds to thousands) in a single test and simultaneously detect deletions, insertions, copy number alterations, translocations, and exome-wide base substitutions (including known “hot-spot mutations”) in all known cancer-related genes. Adoption of clinical NGS testing will place significant demands on laboratory infrastructure and will require extensive computational expertise and a deep knowledge of cancer medicine and biology to generate truly useful “clinically actionable” reports. It is anticipated that continuing advances in NGS technology will lower the overall cost, speed the turnaround time, increase the breadth of genome sequencing, detect epigenetic markers and other important genomic parameters, and become applicable to smaller and smaller specimens, including circulating tumor cells and circulating free DNA in plasma.
During the past 13 years since the launch of the targeted therapy era, there has been a keen interest in determining the cancer cell gene sequence to predict the response to anticancer drugs Table 1.1–4 In late 1998, the era of personalized oncology began with the simultaneous regulatory approval of the anti–HER2-targeted monoclonal antibody therapeutic, trastuzumab, for the treatment of HER2-overexpressing breast cancer and the immunohistochemical-based diagnostic HercepTest (DAKO, Carpinteria, CA) for the identification of patients who are eligible for treatment.5 During the ensuing 12 years, interest in these “drug-test” combinations, also referred to as “companion diagnostics,” has increased exponentially as more anticancer agents have been approved and more new drugs have entered clinical trials based on biomarker profiles.4 The selection of anti–epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors in non–small cell lung cancer (NSCLC) and the use of anti-EGFR monoclonal antibody therapeutics in colorectal cancer (CRC) are now firmly based on pretreatment sequence determinations for hot spots in the EGFR (NSCLC) and KRAS (CRC) genes.1–3
Recently, 2 novel targeted small-molecule drugs have advanced into late-stage clinical trials that feature a genetic sequence detection–based companion diagnostic. PLX-4032 targets papillary thyroid cancers and metastatic malignant melanomas that feature the V600E mutation of the BRAF gene.6 The second drug, crizotinib, has shown impressive efficacy for the treatment of NSCLCs that feature an EML4-ALK translocation.7 The recent introduction of the more thorough and in-depth sequencing technologies (next-generation sequencing [NGS] or massively parallel sequencing) has heightened concerns that the traditional so-called Sanger-type sequencing method used to sequence the human genome 10 years ago will prove incapable of scaling sufficiently to support the breadth of information required to manage an ever-growing arsenal of targeted anticancer drugs. Thus, as PLX-4032 and crizotinib approach regulatory approval in 2011, a significant debate has arisen as to which technology approach is best for detecting gene sequence abnormalities in clinical samples of solid tumors and hematologic malignancies.
Traditional Gene Sequencing Approaches in Clinical Oncology
Traditional DNA sequencing for cancer patient management Table 2 was initially focused on the detection of cancer predisposition in the patient’s germline.8,9 Currently, traditional sequencing approaches remain the cornerstones for detecting BRCA1 and BRCA2 mutations, hereditary nonpolyposis coli (hMLH1 and hMSH2), familial adenomatous polyposis coli, and other familial cancer syndromes8,9 and have also been used to genotype germline DNA to determine polymorphisms associated with anticancer drug metabolism and toxic effects.10–12 In addition, DNA sequence determination has expanded in routine oncology practice to include tumor cell assessments that predict the efficacy of drugs on the market and in various stages of clinical development.13–15
The original sequencing of the human genome used the chain-termination sequencing method developed by Sanger and colleagues.16–18 The modern Sanger method uses automated sequencing instruments that detect fluorescently labeled nucleotide sequences. DNA strands of increasing length are resolved by capillary electrophoresis after DNA elongation along single-stranded amplified DNA templates is randomly terminated by incorporating fluorescent dideoxynucleotide chain terminators.18 Sanger sequencing has been used to identify many clinically significant DNA variants. Sanger and other traditional sequencing platforms (Table 1) have been used for the following: (1) EGFR sequencing in NSCLC to select patients for treatment with EGFR tyrosine kinase inhibitors19–21; (2) KRAS mutation testing of CRC specimens to predict resistance to the anti-EGFR antibody therapeutics, cetuximab and panitumumab22–24; (3) BRAF mutation testing to predict anti-EGFR antibody resistance in CRC25–29; (4) BRAF mutation testing to select patients with metastatic thyroid cancers and melanoma for treatment with the BRAF inhibitor PLX-403230; (5) sequence-based detection of the EML4-ALK translocation in NSCLC to select patients for treatment with crizotinib31–33; and (6) CKIT sequencing for treatment of gastrointestinal stromal tumors, a subset of malignant melanomas that feature activating CKIT mutations,34–38 and a number of hematologic malignancies, including the BCR-ABL translocation in chronic myelogenous leukemia and the RARA-PML translocation in acute promyelocytic leukemia. HER2 gene amplification is not readily detected by traditional Sanger sequencing methods but is detected by NGS technologies.
Pyrosequencing differs from the chain-termination method by relying on an integrated biochemical reaction to detect pyrophosphate released by DNA polymerase after each nucleotide addition during the synthesis of the DNA strand complementary to the DNA strand being sequenced.39 This method is also referred to as a “sequencing by synthesis” method. As each new base is added to the complementary DNA strand by DNA polymerase, the pyrophosphate released from the added nucleotide is combined by a sulfurylase to adenosine 5′ phosphosulfate to form adenosine triphosphate. The adenosine triphosphate is hydrolyzed by luciferase as it converts luciferin to oxyluciferin, releasing photons as visible light detected with a high-sensitivity charge-coupled device (CCD) camera.
Pyrosequencing has been shown to be more sensitive than Sanger sequencing and also provides a measure of the percentage of DNA that harbors detected mutations. However, pyrosequencing is limited in the length of the template DNA strand that can be sequenced, which is significantly shorter than that for Sanger chain-termination sequencing and is prone to errors in reading through homopolymer sequences. Thus, pyrosequencing is most often applied in clinical settings focused on hot-spot sequencing of short-length DNA segments or specific exons where mutations are commonly found to determine mutation status of oncogenes.40–42 Finally, pyrosequencing is better suited for sequencing the heavily fragmented, short-strand-length DNA extracted from formalin-fixed, paraffin-embedded (FFPE) tissue sections than is the Sanger method.
Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry has been applied to clinical samples for DNA sequence analysis with very high resolution and sensitivity, especially for the detection of single nucleotide polymorphisms (SNPs) in germline DNA.43–45 MALDI-TOF has also been used to find disease susceptibility–associated genetic markers and biomarkers of drug response.44,45 MALDI-TOF is particularly adept for finding frameshift and heterozygous mutations. Although MALDI-TOF has been used to evaluate the products of Sanger sequencing reactions, the short read lengths that can be detected in the mass spectrometry detection window preclude using this technique for large-scale sequencing projects.
Allele-specific real-time polymerase chain reaction (RT-PCR) is a method of determining the sequence of targeted hot spots in the cancer cell genome, known to be commonly mutated in cancer, that are important to guide therapy decisions for such tumors as CRC and NSCLC. A variety of commercial PCR-based systems have been developed to perform DNA sequencing and have shown high sensitivity for mutation detection in closed-tube systems designed to reduce the risk of sample contamination.46,47 Primer and probe sets of oligonucleotides are designed to detect specific mutations of clinical interest when run in the closed RT-PCR system. The probes are designed for specific mutations and will not detect other mutations, deletions, or translocations involving related genes. In NSCLC, this method is reported to be sensitive enough for detecting a KRAS mutation at a frequency as low as 1% of the total DNA extracted from an FFPE specimen.48
Quantitative PCR Melting Curve Analysis
This approach is based on the principle that homogeneous DNA amplicons will melt at a characteristic temperature and that the melting curves of DNA products produced by PCR amplification can be analyzed to determine the presence of heterogeneous amplicon products. In addition, any double-stranded amplicon containing a mismatch along its length will dissociate (“melt”) at a lower temperature than expected. When a mixture of DNA amplicons is present, each has a characteristic melting temperature based on its length and sequence composition that can then identify the specific DNA mutations of interest.49 When mutated DNA and wild-type DNA amplicons are present in a PCR mixture and they are denatured and reannealed, melt curves characteristic of the homozygous and heterozygous products can be generated, indicating the sequence composition of the mixture.
A variation of this method is the PCR clamp method, which uses a peptide nucleic acid probe to block the amplification of the wild-type sequence in a DNA sample while permitting amplification of a specific mutation.50 Although this method has high sensitivity, it cannot be used to calculate the relative percentage of mutated DNA, nor can it discriminate the exact nature or location of a sequence alteration.
NGS Sequencing Technologies
The initial near-complete human genome sequence assembly, at the time of its publication in 2001, required more than 12 years of sequencing at multiple laboratories at a cost of more than $3 billion.17,18 Today, the continuous demand for more rapid and low-cost sequencing has driven the development of novel approaches designed to parallel the sequencing process.51–54 In comparison with traditional Sanger sequencing and other sequence analysis methods, these new massively parallel or next-generation strategies Table 355 have increased sequencing rates by orders of magnitude and driven down the per base sequencing cost significantly.56,57 It is now widely anticipated that NGS will enable the in-depth characterization of the cancer cell genome and further advance the fields of molecular pathology and personalized medicine for patients with cancer.
Illumina HiSeq and Genome Analyzer Systems
The massively parallel signature sequencing developed by Lynx Therapeutics was the second or next-generation approach to DNA sequencing. The original Lynx Therapeutics platform was a microsphere (bead)-based system that read nucleotides in groups of 4 via an adapter ligation and adapter decoding strategy using reversible dye terminators.58 Lynx Therapeutics (Hayward, CA) merged into Solexa, which was subsequently acquired by Illumina. This short read sequencing approach is now incorporated into a fluidic flow cell design (HiSeq and Genome Analyzer systems, Illumina, San Diego, CA) with 8 individual lanes. The flow cell surface is populated with capture oligonucleotide anchors, which hybridize the appropriately modified DNA segments of a sequencing library generated from a genomic DNA sample Image 1, Image 2, and Image 3. By a process called “bridge amplification,” captured DNA templates are amplified in the flow cell by “arching” over and hybridizing to an adjacent anchor oligonucleotide primer.58 Actual sequencing is performed by hybridizing a primer complementary to the adapter sequence, then cyclically adding DNA polymerase and a mixture of 4 differently colored fluorescent reversible dye terminators to the captured DNA in the flow cell.58 By using this approach, nonmodified DNA fragments and unincorporated nucleotides are washed away, while captured DNA fragments are extended 1 nucleotide at a time. After each nucleotide-coupling cycle takes place, the flow cell is scanned, and digital images are acquired to record the locations of fluorescently labeled nucleotide incorporations. Following imaging, the fluorescent dye and the terminal 3′ blocker are chemically removed from the DNA before the next nucleotide coupling cycle Figure 1.58
The Illumina technique is the most widely used NGS platform, but it is limited by a relatively low multiplexing capability.59 A number of technical issues, particularly due to aberrant nucleotide incorporation rates driven by polymerase error incorporating the modified deoxynucleotides, place major responsibility on the roles of the bioinformaticists and computational biologists to interpret the sequencing results that are produced by the Illumina systems.60,61 The Illumina system has been applied in programs for gene discovery, whole exome analysis, and SNP detection by resequencing.62 Examples of use of the Illumina HiSeq technique on clinical surgical pathology–processed tissue samples are shown in Image 1.
First introduced in 2005, the 454 Genome Sequencer FLX Titanium System (Roche, Branford, CT) NGS platform relies on highly parallel PCR reactions taking place in tiny emulsions composed of a primer-coated bead with a single captured DNA template encased with the DNA polymerase, primers, and nucleoside triphosphates (NTPs) necessary for PCR in an oil droplet. PCR amplification results in each bead becoming coated with a single DNA amplicon. The emulsions are broken, and the DNA-coated beads are loaded onto an array of picoliter wells for the sequencing reaction.55,63 Pyrosequencing is performed over the picoliter well array, and the nucleotide additions are visualized and located by a fiberoptic-coupled imaging camera.
The system provides longer read lengths than other NGS technologies, a strength of this system.62 The 454 NGS platform features the strengths of relatively fast instrument run times but is limited by the contamination risks associated with emulsion PCR and problems with error rates in genetic regions rich in homopolymer repeats.62 454 sequencing has mainly been applied to targeted capture sequencing, high-resolution confirmatory sequencing for other NGS platforms, and plant and insect whole genome mapping.5,63,64
The Heliscope platform involves fragmenting the sample DNA and performing polyadenylation at the 3′ ends of the fragments. Denatured polyadenylated strands are captured by hybridization to poly(dT) oligonucleotides immobilized on a flow-cell surface. This approach was the first method to successfully perform single molecule sequencing.65 The flow cell is cyclically flooded with fluorescently labeled deoxynucleoside triphosphates (dNTPs) in the presence of DNA polymerase, which incorporates nucleotides from the oligo-dT primer. The flow cell is imaged at each cycle using a CCD camera, allowing the identification of the location of each nucleotide incorporation event. As in other systems, the fluorescent label is cleaved and washed away before each subsequent sequencing cycle.62,65 The Heliscope system is considered to provide the most nonbiased DNA sequence, which is its strength, although relative to competing NGS platforms, it has relatively high NTP incorporation error rates.62
SOLiD sequencing (Supported Oligonucleotide Ligation and Detection) is based on DNA ligase–mediated oligonucleotide ligation after PCR amplification in an emulsion format.64 The primers in SOLiD NGS are progressively offset to allow the adapter bases to be sequenced when used in conjunction with the color-space coding for determining the template sequence by deconvolution.58 Fluorescent signals are captured by CCD camera imaging before enzymatic cleavage of the ligated probes and, after washing, repeating the sequencing process. The SOLiD approach has been used in applications similar to the Illumina NGS, including whole genome sequencing, whole exome capture, and sequencing and SNP discovery.65 Strengths of the SOLiD approach include reduction in sequencing error rates relative to the Illumina NGS by using 2-base encoding. A drawback of the SOLiD system has been its relatively long run times and complex analysis requirements.62
Ion Torrent Sequencing
This platform was acquired by Life Technologies, which claims that this PostLight sequencing technology has the major advantage of being the first platform to eliminate the cost and complexity associated with the 4-color optical detection currently used in all other NGS platforms. The Ion Torrent method relies on standard DNA polymerase sequencing with unmodified dNTPs but uses semiconductor-based detection of hydrogen ions released during every cycle of DNA polymerization.66 Each nucleotide incorporation into the growing complementary DNA strand causes the release of a hydrogen ion that is sensed by a hypersensitive ion sensor.66 The initial Ion Torrent system has relatively low parallelism, so it tends to be focused on short sequence determination of mutation hot spots throughout the genome.
Comparison of NGS Platforms
The relative strengths and weaknesses of the NGS platforms as they are applied to the sequencing of the somatic cancer cell genome are summarized in Table 3. All NGS platforms have high entry costs, but all also have the potential to dramatically reduce the cost of comprehensive genomic profiling of cancer cells in the near future. Some techniques offer speed, such as the 454 Pyrosequencing and Ion Torrent platforms, but, compared with the Illumina and SOLiD platforms, may not be as well suited for clinical somatic tumor DNA sequencing owing to their relatively limited capacities for supporting highly parallel, deep sequencing. Some NGS platforms are better adapted to wide genome scans because their massively parallel capacity can support the depth of sequencing that somatic mutation detection requires, whereas others are more adept at the sequencing of targeted genomic regions. It is also important to note that all NGS technologies generate unparalleled quantities of data and are heavily reliant on the data processing, storage, and analysis tools that accompany the instruments and on downstream data analysis pipelines and storage strategies.62,67–70
There is great interest in simplifying the sample and data analysis workflow and improving the accuracy of mutation detection (also known as “mutation calling”) and identification of other genetic abnormalities obtained from NGS data sets. Similarly, cost reduction is being addressed by improving how efficiently each sequencing run is loaded. One approach relies on applying bar-code sequence tags or radio-frequency identification tags to sequencing libraries before pooled hybrid capture of the sequencing libraries.71
Despite recent advances in the computational biology required to align the genome and differentiate the significant findings among the massive amounts of sequencing data generated by NGS, these emerging technologies will continue to face analytic challenges for some time. Clinical applications, particularly in oncology, demand high levels of accuracy, sensitivity, and specificity for sequence-based tests focused on calling out specific genetic alterations.
Comparison of Traditional and NGS Strategies for Cancer Cell Genomic Analysis
A comparison between traditional Sanger DNA sequencing and NGS for somatic cancer cell analysis is given in Table 4. The relative costs of the 2 approaches are of great importance to current and future test providers, consumers, and payers. Without question, the expertise, especially in computational biology, required to perform clinical NGS testing for patients with cancer is significantly higher for this NGS approach than for traditional Sanger sequencing. Although the cost per base sequenced for the traditional approach is higher on a per test basis, this “one gene, hot spots only” approach is less expensive than the cost for an NGS test incorporating hundreds or thousands of genes or a whole exome.
The predominant cell cycle stage of the sample to be sequenced can influence the final result, especially when gene copy number is being assessed by NGS. In regard to daily clinical pathology practice, traditional and NGS sequencing approaches are challenged by defining the best sample to test (eg, primary vs metastatic tumor tissue or tumor tissue vs circulating tumor cells), small samples such as from fine-needle aspiration biopsies, extensive necrotic rather than viable tumor tissue, tumor heterogeneity for mutations and other genetic abnormalities, and samples that feature a very low percentage of tumor DNA.
The relatively restricted sequencing capacity of traditional Sanger sequencing limits this approach to single-gene or hot-spot sequence analysis (eg, codons 12 and 13 of exon 2 in the KRAS gene), and the difficulties and high cost of multiplexing this technology for multiple gene analysis are significant drawbacks to this approach for cancer genomics. NGS platforms accommodate large-scale gene sequencing, which can determine the status of “expected” potential mutations in a given clinical situation and also discover “unexpected” sequence abnormalities, which, in turn, may significantly alter treatment planning based on genomic analysis. NGS sequencing can provide gene copy information such as homozygous and heterozygous deletions and gene amplifications, whereas traditional sequencing approaches cannot. NGS can also detect translocations, which also drive therapy selection, as in the case of the EML4-ALK translocation and the selection of crizotinib in NSCLC. Although the turnaround time for NGS of a multiplex (eg, >100-gene) cancer genome sequence is longer (eg, 7–14 days) than traditional single-gene hot-spot sequencing, it is anticipated that this difference will rapidly narrow as NGS technology continues to evolve.
Although published NGS data describing clinical cancer samples are limited, it seems that NGS will exceed traditional Sanger sequencing in sensitivity for mutation detection in samples in which a mutation is present in only a small percentage of the total DNA extracted from a specimen. Although traditional Sanger sequencing and NGS sequencing results are most easily interpreted for a tumor cell sequence when a matched reference germline sequence is available, improving algorithms and public databases of germline and somatic mutations that can be used to filter results are making direct tumor sequence assessment increasingly reliable. Despite rapidly advancing sequencing technology in an era of growing demand for a more personalized approach to oncology practice, it remains likely that traditional and emerging cancer cell diagnostics, including histopathologic slide–based results (immunohistochemical studies and fluorescence in situ hybridization), epigenetic testing such as methylation-specific RT-PCR profiling, full transcriptional analysis, and microRNA profiling, will be combined with tumor cell DNA sequencing in some form of integrated laboratory report.
Sample Procurement and Preanalytic Issues
By using their morphologic skills to identify appropriate tumor tissues, anatomic pathologists will have major roles in the introduction of whole genome sequencing into routine clinical practice. Although for the present the benefits of using FFPE specimens for clinical cancer cell gene sequencing make it the preferred source of DNA to be sequenced, the growth in molecular diagnostics may, in time, support a transition to freshly collected, buffer-stabilized tissue specimens.
In either case, a number of preanalytic factors are known to potentially affect the results of molecular diagnostic tests on solid tumors and hematologic malignancies. Among the factors affecting all FFPE surgical and biopsy specimens are specimen age, warm ischemia (time within the patient devoid of blood supply), and cold ischemia (time outside the patient before stabilization in buffer or formalin). For fixed specimens, important factors include characteristics of the formalin fixative (eg, concentration, zinc content, and pH), duration of fixation, type of tissue processing (routine or microwave-based), temperature of paraffin during embedding, and storage conditions of paraffin blocks.72–74
In general, DNA sequencing has been successful using FFPE material of relatively recent vintage, although as the age of FFPE specimens increases, they generally demonstrate increasingly significant DNA degradation. It is also well known that nucleic acids degenerate more quickly when they are stored as unstained microscopic slides vs intact paraffin blocks, most likely a result of enhanced tissue oxidation. Specimens with fragmented DNA are more difficult to sequence owing to the relative reduction in long DNA segments and the accumulation of short segments that are less likely to result in successful sequencing library generation.75
NGS sequencing can produce upwards of 1 billion sequences per instrument in a 4-day run, and the cost per base sequenced is far cheaper than for the traditional dye-terminator methods. NGS is highly dependent on sophisticated bioinformatic analysis programs and faces significant data management and interpretation challenges for confirming whether a specific locus in a single cancer-related gene is wild type, an SNP, or a somatic mutation. A great strength of NGS approaches is that, they not only can detect base substitutions but also can simultaneously find insertions, deletions, copy number alterations, and translocations.
Now that NGS is being widely applied to FFPE specimens, data defining its sensitivity and specificity limits for detecting cancer gene abnormalities will soon be known. This information will be critical for interpreting the many somatic genetic aberrations that occur in cancer with very low frequencies, many far less than 1%. Experts in the cancer sequencing field hold that as many as 75% or more of cancer gene sequence variations may be missed by a hot-spot-only analysis. However, it is equally clear that tens or hundreds of individual traditional mutation analysis assays to test for all of the possible important mutations in a given tumor are not practical or feasible given the amount of tumor sample, time, and money it would take to perform so many individual tests. Given the potential generation of “actionable” gene sequence abnormalities and abnormalities of unknown clinical importance, the successful NGS-based clinical service will have a result report that is focused on the abnormalities that can lead directly to the best available therapy, direct a patient to an ongoing clinical trial, or result in some other actionable response from the treating oncologist.
Cancer-specific NGS will rapidly expand from a focus on mutations and gene copy number alterations to include translocations followed by assessment of epigenetic abnormalities. Moreover, the starting sample may well change as understanding of cancer biology progresses. Currently, surgical biopsy and resection specimens are the most widely analyzed samples; however, medical practice suggests that a transition to testing ever-smaller amounts of DNA is likely to be required as fine-needle aspiration biopsy specimens, circulating tumor cells, and even DNA from apoptotic cancer cells circulating in plasma become clinically relevant patient specimens. The technologic advantages offered by NGS will eventually reduce the cost of sequencing and decrease the turnaround times that will support unprecedented analytic depths of understanding about tumor biology, enabling highly individualized patient care. Thus, as NGS enters clinical testing in 2011 Figure 2, the impact on physician decision making and cancer outcomes will be widely followed by oncologists, pathologists, payers, and, most of all, patients with the disease.
Upon completion of this activity you will be able to:
contrast the advantages of next-generation gene sequencing technologies with the traditional Sanger, pyrosequencing, and allele-specific polymerase chain reaction approaches.
list the current uses of cancer cell gene sequencing for the management of patients with solid tumors.
The ASCP is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The ASCP designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 Credit ™ per article. Physicians should claim only the credit commensurate with the extent of their participation in the activity. This activity qualifies as an American Board of Pathology Maintenance of Certification Part II Self-Assessment Module.
The authors of this article and the planning committee members and staff have no relevant financial relationships with commercial interests to disclose.
Questions appear on p 653. Exam is located at www.ascp.org/ajcpcme.