Next generation sequencing (NGS) has ignited an unprecedented pace of discovery in the biomedical sciences that is fundamentally transforming the way that we understand, diagnose and treat disease, and has motivated the belief that true precision medicine – medicine that is tailored to an individual’s genetic, biochemical and exposure profile – will be a reality in the near term. With minimal sample requirement, NGS can enable the concurrent genome-wide study of genetic variations, transcriptomes, and certain epigenetic modifications. However, interrogating proteins as efficiently as DNA and RNA can be interrogated with NGS is lacking and this hampers more comprehensive views of molecular physiology and limits advances in biomedical science and precision medicine. The fact is that innovations in proteomic technologies pale in comparison to the advances in NGS, with current methodologies suffering from issues related to reproducibility, sensitivity, sample requirements, and limited multiplexing capacity. The development of proteomic technologies to overcome these limitations would fill the void in systems biology research, catalyze clinical innovations, and expedite the realization of precision medicine.
The Rise of DNA Sequencing and the Belief in Precision Diagnosis and Treatment
The massively parallel sequencing of DNA and RNA, via next-generation sequencing (NGS), has led to a surge of information about the determinants of human health and disease, anticipating a time when this information will be exploited in the clinic to better identify patients that should be treated in very specific ways ( Figure 1 ). The cost and technical efficiency of DNA sequencing has improved exponentially since the mapping of the first human genome, which came at a cost of roughly 3 billion dollars. Today, a single technician can go from a DNA sample to a mapped human genome in just a few days for about a thousand dollars.
NGS in clinical research has sparked the development of an exciting new era of diagnostics and prognostics, wherein specific genetic variations are assessed to predict disease susceptibility and drug responsivity. The power of this technology is perhaps best reflected by the growing list of federal and private-sector initiatives leveraging NGS technologies for the betterment of human health, including the Precision Medicine Initiative (PMI; https://www.nih.gov/precision-medicine-initiative-cohort-program ), The Genomics England project ( http://www.genomicsengland.co.uk/ ), 23andMe, Color Genomics, Foundation Medicine, Pathway Genomics, Sequenom, and countless others. These pioneering academic and commercial NGS efforts have embedded NGS into the public consciousness and is steadily increasing the demand for NGS as a clinical mainstay. Indeed, many individuals are interested in having their genome sequenced, their ancestry profiled, and genetic predictions about their health provided ( 1 ). However, the reliable and accurate use of genetic testing for inherited disorders currently appears to be confined to specific Mendelian diseases, as some 3,348 genes implicated in Mendelian disorders have been identified, along with a few adverse reactions to specific drugs ( 2 ). More complex disorders are harder to predict with available genetic information alone ( 3–6 ). For cancer the use of tumour DNA and RNA sequencing has also advanced insight enormously. For example, The Cancer Genome Atlas and the International Cancer Genome Consortium projects have led to an unprecedented and detailed understanding of tumourigenic lesions and the somatic cell replication acquired mutations associated with them ( 7 ). Although these discoveries have immediate clinical utility it is often the case that no single individual mutation can be used as a diagnostic indicator for the progression of normal cells to cancerous lesions ( 8 , 9 ) due to factors such as tumour heterogeneity ( 10 ). This has consequences for the guidance of treatment decisions on the basis of the tumour mutation profile ( 11 , 12 ).
DNA Sequencing in Precision Prognosis, Early Disease Detection and Symptom Tracking
Although DNA sequencing has been leveraged in diagnostics, disease risk prediction, and pharmacogenetic initiatives, there is growing interest in its use in evaluating early or stable signs of disease in clinical settings. For example, nucleic acid fragments such as exosome cargo, cell-free circulating tumour DNA (ctDNA), or microRNAs are being considered in the diagnosis of cancer, autoimmune disorders, pregnancy, or transplantation rejection ( 13–24 ). Many of these fragments can be captured with NGS-based assays. Exosomes offer a breadth of information on how cells within the body communicate and it is suggested that exosomes might contain unique molecular cargo targeted towards very specific cell types as was shown recently ( 25 , 26 ). ctDNA has been studied by a number of researchers in the context of both cancer treatment prognosis as well as cancer diagnosis. However, ctDNA arises from either apoptotic or necrotic cells, and can often only be detected at middle to late stages of cancer hence the current belief that it is more suited for assessing treatment response ( 27–33 ). Thus, while these many advances in using genetic information and/or other nucleic acid content to make general predictions about disease and treatment response are promising, it appears that other biological molecules may be more useful. In this context, it has been suggested that proteins are much more likely to offer guidance on the diagnosis of disease or understanding one’s health ( 34 ) given that they are the products of many genetic phenomena, change in levels over time during disease pathogenesis, are often the direct targets of drugs and are often used as drugs themselves.
Complications in the Interrogation of the Proteome in Health and Disease
Proteins are not only the most common effectors of disease pathogenesis, but are also determinants of treatment response. Unfortunately, protein analysis is technically challenging, inefficient, and represents a major bottleneck in efforts to advance our understanding of human health and to develop effective therapeutics. While the measurement of DNA and RNA have some value in the prediction of protein function, these measurements do not always correlate with protein levels ( 35 , 36 ) and are blind to the factors most critical to protein function, such as post translational modifications. Despite the inherent limitations in proteomic methodologies, as many as 115 protein-based assays have been approved for use by regulatory agencies and commercialized with great success as a result ( 37 ). As a result, it has been argued that the analysis of protein content within, e.g. blood, can offer a snapshot of the overall health of an individual. Blood is an attractive tissue to assess because of its accessibility, but may not be ideal to study if the lesion associated with a disease is in a less accessible tissue whose protein levels do not correlate with those in the blood. Although we emphasize the interrogation of the blood proteome here because of its wide-spread appeal, certainly exploring the proteome in other tissues (where possible) and contexts is important. At any given time there may be as many as 500,000 proteins of various isoforms within the blood, where these proteins may be leakage proteins from tissues around the body, various immunoglobulins reflecting the state of the immune system, or a multitude of native blood-based proteins that reflect the overall health status of an individual ( 38 ). Unfortunately, the unbiased identification and quantification of proteins requires their direct measurement with technologies that specifically detect their unique structure, mass, charge, or biochemical composition. DNA in contrast, is composed of four nucleobases that form highly predictable structures through Watson-Crick base-pairing interactions, can be amplified directly with reliable methodologies, and can be sequenced to identify their exact composition. These properties of DNA and RNA, most notably amplification and sequencing, have enabled NGS-based assays to be used for their interrogation, whereas the complexities of protein analysis have not allowed for similar developments in assays focused on protein content of biological specimens. However, there are older and emerging technologies bringing the pace of discovery in proteomics to that of genomics and transcriptomics.
Interrogating the Proteome via Mass Spectrometry
Much research seeking to discover how the proteome affects various diseases or health statuses has leveraged mass-spectrometry (MS). As many as 4,400 proteins or more in a single sample can be looked at with MS, where 10 and possibly more samples may be run in parallel, using isobaric tags ( 39 ). This allows an almost unparalleled view of the proteome, making MS a promising platform for discovery ( 40 ). Indeed, recent applications of MS include one that identified 13 blood proteins that could discriminate metastatic pulmonary nodules from benign nodules with a negative predictive value (NPV) of 94% ( 41 ). Another unique application of this technology included investigating protein-protein interactions with unprecedented detail ( 42 ). Even more impressive was the demonstration that the binding profiles of various drugs to a pool of roughly 7000 proteins could be obtained using a cellular thermal shift assay (CETSA) in tandem with quantitative MS ( 43 ). Despite these notable applications, there are some inadequacies with MS-based analyses, namely sensitivity and throughput. MS has difficulties in the detection of low abundance proteins, which are likely to be implicated in the early stages of disease. Often times MS’s limit of protein detection is in the high picomolar range, which is orders of magnitude less sensitive than what is possible with other protein analysis technologies. Various methods of depletion have been utilized to remove abundant proteins from a sample, and more advanced methods such as multiple reaction monitoring (MRM) or parallel reaction monitoring (PRM) may be utilized that enable the detection of low abundance proteins ( 44–46 ). In addition, others have used immunocapture of low abundant proteins as a mean to enrich them for detection, however, in these cases as much as 1 mL of sample is required to detect proteins in the pg/mL range ( 47 ). These methods do work quite well, but sample preparation, and analysis of MS spectra can often time lead to differing results depending on the personnel or lab that performs the experiment ( 40 ). Thus, it can be argued that the real utility in MS is in the identification of how the proteome changes broadly, while more sensitive technologies are required to complete the picture of the more subtle dynamics of the proteome ( 48 ) ( Figure 2 ).
The Utility of Affinity Agents in Detecting and Quantifying Proteins
The broad survey of the proteome can be complemented by more focused surveys, and this is where novel affinity agents are of high utility. As noted, the analysis of proteins is notably difficult due to their high degree of structural variation afforded by the large number of post-translational modifications, splice variants, or protein-complexes that are possible ( 49 ). Affinity agents can be used to target specific proteins or protein variants with high-sensitivity and usually with very high specificity. There are numerous affinity agents that have been developed, such as antibodies, DARPins, aptamers, small peptides, affibodies, leucine-rich-repeats (LRRs), etc ( 50 , 51 ). Many of these agents have been used for both diagnostic and therapeutic investigative studies; however, there are a limited number of these agents in clinical use and this has prevented their widespread use and adoption ( 50–52 ). Two types of affinity agents have garnered the most attention and developmental focus: aptamers and antibodies
Aptamers are relatively short oligonucleotides (50–100 nucleotides in length), and are developed through the iterative evolution of a random library of oligonucleotides until an aptamer of sufficient affinity is acquired. There have been numerous developments of these molecules, where the binding of proteins by aptamers is either detected directly through capture and pull-down assays, such as with the SomaScan technology developed Somalogic, Inc. Aptamers can be evolved directly to operate in a sandwich-type assay, or indirectly through the release of other nucleic acids or fluorophores upon protein binding to the aptamer sequence (termed structure-switching aptamers) ( 53 , 54 ). In pioneering work, 813 aptamers were used to interrogate blood samples from patients with chronic kidney disease (CKD), leading to the identification of 58 additional biomarkers that were unknown previously. The assay had a limit of detection (LOD) of ∼100 fM and was able to detect low abundance proteins that would have gone unnoticed by mass-spectroscopy ( 55–58 ). This type of analysis has also been performed to identify early indicators of non-small cell lung cancer (NSCLC), Alzheimer’s disease (AD), aging, Duchenne muscular dystrophy (DMD), and influenza ( 59–61 ). The combination or multiplexing of multiple protein biomarkers in a single assay is crucial; for example, for NSCLC, testing for the relative levels of 12 proteins allowed NSCLC to be classified with 91% sensitivity and 84% specificity ( 62 , 63 ). The generation of high-affinity aptamers, termed SOMAmers (or Slow-off Rate Modified Aptamers), which are composed of modified hydrophobic uracil residues as well as the normal nucleobases, allow this high degree of multiplexing. Impressively, as many as 1,129 proteins have been analyzed in parallel from a sample volume less than 100 microliters. Others have generated aptamers of comparable affinities by using an expanded genetic alphabet for aptamer evolution ( 64 ). However, aptamers with modified nucleobases may not be required for highly specific and highly stable aptamer-protein complexes. In fact, other methods have been developed to create high-quality aptamers using the canonical four bases of DNA. Wang et al. pioneered developments of these molecules with vary attractive approaches that allowed for aptamers with K d ’s in the low picomolar range to be obtained with a minimum number of directed evolution cycles ( 65 ).
Antibodies for multiplex protein analysis
Although the use of aptamers in the interrogation of protein biomarkers is extremely encouraging, the numbers of aptamers currently available for use is limited. The proteome is composed of numerous proteins with post-translational modifications, those that arise from mRNA splice variants, or protein complexes that limit the ability to identify them with affinity agents. The availability of antibodies (∼2 million commercially available) ( 66 ); that could potentially identify all subtle variations in proteins make them a very attractive set of molecules to use for protein interrogation, although their use does suffer from issues like cross-reactivity, batch variability, and application misuse ( 66–68 ). These issues have limited the degree of multiplexing that can be achieved, as sandwich-type multiplex experiments typically do not go beyond 20 or 50-plex assay and a 100-plex assay could have as many as 35,000 possible cross-reactivity interactions ( 69 ). Although the construction of antibodies with high affinity to their targets can ameliorate cross-reactivity issues, the validation of mono-specific antibodies requires robust qualification that many commercially available antibodies do not have data for. Nonetheless, there have been numerous studies recently that have overcome these issues to push the limits of antibody multiplexing experiments in extremely novel and innovative ways. Many of these technologies have moved away from the use of standard microarray and/or serial analysis of many markers (antibody arrays, multiplex flow-cytometry) due to limitations on throughput, availability of the fluorescence spectrum for multiplex analysis, or from the requirement to assemble costly materials for each individual multiplex analysis ( 70–73 ).
One very recent development that has tremendous promise involves the attachment of ‘barcode’ molecules to antibodies. The direct attachment of barcodes (mass-tags, oligonucleotides, colour-coded beads) to antibodies that can be de-convoluted with instrumentation allows for proteins to be interrogated in various states (soluble, intracellular, cell surface) with relative ease and even at the single-cell level ( 74 ). This has been shown with methods that label antibodies with transition metal elements or short oligonucleotide sequences ( 75–78 ). Pioneering work using metal-tagged antibodies, termed mass-cytometry in practice, has allowed for the high-throughput analysis of single cells that is currently unmatched in the field. The high-throughput quantification of proteins in single cells has allowed for the understanding of how the protein networks within a cell can engender many different phenotypes through the subtle changes of various proteins ( 79 ). For instance, in a study that sought to understand the hematopoietic cell development it was found that subtle differences of cell surface proteins that are used to classify various cell types, manifested in subtle changes in phosphorylation states or expression levels upon application of specific drugs ( 80 ). This type of analysis simply wouldn’t be possible without a technology that looks at many proteins at once from thousands of individual cells. More intriguing applications of mass-tagged antibodies have been to performed, such as multiplex protein imaging, however the time required to perform such an analysis took days ( 81 ). One downside to mass-cytometry is the lower sensitivity of the technology in comparison to other more conventional flow-cytometry experiments that use fluorescent molecules rather than transition metals for protein detection. Others have experimented with technologies that have the potential to match the throughput of mass-cytometry using clever PCR barcoding of the individual contents of a cell within a droplet ( 82 ).
Many of the applications that have been pursued with mass-tagged antibodies have also been shown to work with antibody-oligonucleotide-conjugates (AOCs), from single-cell analysis to imaging ( 83 ). Of notable interest is a recent example that demonstrates the 24-plex protein analysis of single-cells using antibody-oligonucleotide-conjugates (AOCs) using a proximity-extension-assay (PEA). In this paper, the authors sorted single cells into individual wells after staining the cells with conjugates, and then utilized PCR to amplify the reporter molecule (a dsDNA product) and determine the concentration of the proteins using qPCR ( 84 ). The higher specificity afforded by two antibodies binding to a particular protein allows for a greater degree of multiplexing, avoiding cross-reactivity issues that plague normal sandwich-type assays ( 85 ). Using PEA Albayrak et al. were able to analyze both the protein and exome contents of individual cells within droplets with a limit of detection in the hundreds of attomolar ( 86 ). A similar type of assay that utilized AOCs was used for analyzing fine-needle aspirates in a clinical setting where the authors were able to glean the dramatic tumour heterogeneity of protein expression ( 87 ). Even more intriguingly their study revealed the intricacies of protein signalling upon drug application. In this particular case gefitinib was used to inhibit EGFR signalling, and it was observed that in cells with high EGFR concentration the application of gefitinib inhibited the phosphorylation of the targets of the kinase domain of EGFR ( 87 ). Others have gone one step further and analyzed the total exome as well as 100 proteins in parallel from single-cells. Darmanis et al. performed a study investigating the effects of BMP4 as a therapeutic agent for glioblastoma, and during their single-cell analysis could identify cells that were prevented from fully differentiating despite BMP4 application, but had specific RNA and protein signatures ( 88 ). The study highlights the amazing potential of AOCs in tandem with exome analysis to highlight the unique dynamics occurring within individual cells. The potential of AOCs is vast, but for their widespread utility there will need to be synthetic methods that allow for their cost-effective and routine production ( 89 , 90 ). This is highly important because while fluorophores, enzymes, and small-molecules are relatively easy to functionalize and conjugate in large-scale, the attachment of oligonucleotides with a unique sequence will require individual syntheses. This has been addressed in recent work that describes the rapid formation of AOCs using novel bead-based synthetic methodologies ( 91 ).
The Future: Integration of Protein Assays with Other Assays
Ultimately, despite the fantastic technologies that have been described here, it can be seen that there is a trade-off in the analysis of proteins, where MS offers a very broad survey of the proteome with minimal sensitivity, while affinity assays offer a very narrow survey of the proteome with very high sensitivity ( Figure 2 ). Currently, the use of aptamers offers the greatest compromise between breadth and sensitivity for proteome analysis. However, the limitations imposed by the number of aptamers as well as the currently complicated workflows could impede their use. Future technologies could possibly allow for the single-molecule identification and quantification of proteins, offering both a broad survey of the proteome as well as high-sensitivity, as has recently been shown using nanopore sequencing methods ( 92–94 ). However, much work still remains to be done. In the near-term it is likely that many studies will seek to combine the analysis of protein, DNA, and RNA into a single-workflow such that integrative studies can be performed routinely ( 95–100 ). This type of analysis will most likely serve as the de-facto standard for precision medicine and the routine monitoring of health ( 101–105 ). In this light, the study and interrogation of proteins is absolutely critical, yet proteomics assays are only now catching up to genomics assays in terms of their efficiency, reliability, cost-effectiveness and ultimate adoption.