The next big thing on the Web—which World Wide Web inventor Tim Berners-Lee nicknamed “Web 3.0,” or the Semantic Web—may also be the next milestone in cancer information.
Already deployed in the U.S. census, the catalog of electronics retailer Best Buy, and Facebook pages, Semantic Web technology could give rise to “a killer app that allows the clinical oncologist to access, integrate, and analyze drug and genomics data, medical records, and other cancer-related information to enhance care and efficiency,” said Kei-Hoi Cheung, Ph.D., an associate professor at the Yale University School of Medicine Center for Medical Informatics.
Cheung’s research team at Yale—and groups at University College London (UCL) and the University of Mondragón in Spain—have published recent reports rethinking programs such as the Cancer Biomedical Informatics Grid (caBIG). A National Cancer Institute initiative established in 2004, caBIG was designed to facilitate rapid and seamless cancer diagnostic, statistical, genomic, and clinical trial information exchange across multiple data platforms, regardless of language, geography, or scientific discipline.
But the program came at a difficult juncture—the end of one bioinformatics revolution and the beginning of another. Conceptually ahead of its time with tools that lagged behind, caBIG came under fire for software problems and limited adoption by potential users.
The three research groups propose simpler, richer approaches that use Semantic Web technology—a form of artificial, computer-driven intelligence that represents a quantum leap over current Web technologies such as HTML—to identify connections between seemingly unconnected findings and infer hypotheses from raw data.
For oncology, this swapping out of Web 2.0 for Web 3.0 in oncology “has the potential to aid in disease diagnosis and treatment by enabling the clinical oncologist to find relevant information more quickly and accurately,” Cheung said.
Semantic Web Browser
To illustrate the Semantic Web’s potential power for cancer care and research, the computer science department at England’s University of Manchester partnered with Washington D.C.-based software firm Clark & Parsia to develop a Semantic Web cancer information “browser.” Called “OwlSight,” similar to familiar browsers such as Microsoft Internet Explorer or Mozilla Firefox.
A preloaded demonstration illustrates how a physician with four hypothetical patients could access the browser from his or her desktop. After securely uploading a patient’s medical history, physical exam data, and lab results, the physician would then click a browser button that asks, “What is the probability that patient Mary has breast cancer or is at risk for breast cancer?” The browser searches scores of worldwide cancer databases and, in seconds, returns a probability that moves closer to or farther from 100% certainty, depending on the type and amount of data available.
The same browser could also deliver the latest research findings in a clinically relevant way. For instance, a geneticist who discovers a tumor-suppressor gene enters the findings in an international genomics database. A statistician enters a population study about the gene’s effect on 5-year cancer survival in an epidemiology database. A biotech startup enters information about a drug that activates the gene in a pharmaceutical database. Finally, an oncologist with a patient who has the gene uploads the patient file into a Semantic Web browser.
The browser—which “crawls” scientific databases for information much the same way that Google crawls websites—presents the new drug treatment to the physician; predicts how it will affect 5-year survival; tracks the information sources for verifiability; and delivers other relevant, up-to-date information.
“It’s very encouraging to see people working on cross-disciplinary information platforms like this,” said John Robertson, M.D., vice chief of radiation oncology at the Beaumont Hospital Medical Center in Royal Oak, Mich. “If someone came up with a truly integrated information system, including ways to validate the information at hand, it would be a very big deal.”
Although the Semantic Web’s clinical applications are still hypothetical, groups are already using it to advance their research. The Yale group has used Semantic Web technology to show why decitabine (Dacogen)—a drug used to treat anemia and leukemia—also kills cells from several melanoma lines.
In a 2008 report in Cancer , researchers at the University of Texas M. D. Anderson Cancer Center in Houston first suggested that decitabine—which inhibits DNA methylation, a well-known oncogenic mechanism—could treat melanoma. Clinical trials followed, including the ongoing Combination of Decitabine and Temozolomide in the Treatment of Patients With Metastatic Melanoma at the University of Pittsburgh.
The Yale team sought a more complete picture of decitabine’s cancer-fighting mechanisms by loading a Semantic Web data warehouse they created called Corvus with information from Gene Ontology, a global bioinformatics program that standardizes genetic information.
The team equipped Corvus with a Semantic Web database-querying language called SPARQL that asked why five of seven melanoma cell lines with acronyms such as YUMAC and YULAC are sensitive to decitabine.
The SPARQL-queried database returned a resource description framework (RDF)—a Semantic Web graph representing how well decitabine inhibits each melanoma cell line. The graph not only returned data but also, in true Semantic Web fashion, generated a new and testable hypothesis for decitabine’s action. In the melanoma lines most sensitive to decitabine, the RDF showed the drug may activate genes that promote apoptosis.
“We used the Semantic Web for its reasoning capabilities,” said team member and Yale University pathology professor Michael Krauthammer, M.D., Ph.D. “Our proof of concept illustrates how easily data from various sources can be integrated and reveals some of the power of Semantic Web reasoning for inferring and elucidating knowledge.”
An Oncology Ontology
Integrating the various dialects that scientists speak is another major challenge of applying information technology to oncology. “We have an informatics ‘Tower of Babel,’” NCI bioinformatics director Ken Buetow, Ph.D., told the Journal in 2004 ( see J. Natl. Cancer Inst. 2004;96:580) “Each part of the cancer research community speaks its own scientific dialect,” said Buetow, who started and oversees caBIG. “They publish in their own journals. They deposit their data in their own databases.”
Although it showed early promise, caBIG has done little to streamline the babble, said J. Robert Beck, M.D., chief academic and medical officer at Fox Chase Cancer Center in Philadelphia.
“Despite the investment in caBIG and its underlying information architecture, caGrid, the system is not in general use among cancer researchers,” explained Beck, who has written many articles and presentations on the cancer information grid. A series of articles in the Cancer Letter earlier this year described many of caBIG’s problems, he added. The three-part series blamed software bugs and contracting, scientific, and legal disputes for sinking “NCI’s $200 million bioinformatics venture.”
One major problem at caBIG’s information architecture level: caGrid does not use the NCI thesaurus, which provides a common vocabulary, or ontology, for the cancer community, according to a December 2010 University College, London UCL study published in Nature Proceedings and presented at the 2010 Semantic Web Applications and Tools for Life Sciences conference in Berlin.
In the study, UCL computational and systems medicine researchers Alejandra Gonzalez Beltran, Ph.D., Anthony Finkelstein, Ph.D., and Ben Tagger, Ph.D., also claimed that caBIG cannot infer hypotheses from data. Nor, they said, does it make full use of the so-called metadata it already contains—descriptions of data such as how it was discovered (e.g., in a lab or in hospital records), who discovered it, and why it is important in a given context.
“Metadata descriptions add context, organizing raw data into useful information,” said Brad Pollock, Ph.D., epidemiology and biostatistics chairman at the University of Texas Health Science Center at San Antonio. Pollock explained that caBIG functions near the low end of a knowledge hierarchy characterized by raw data on the bottom rung, information in the middle, and knowledge at the top.
Google also functions at the low end of the knowledge hierarchy. For example, typing raw patient data into Google—such as “fever, non-tender left lower quadrant mass, father had colon cancer, WBC, blood in stool,”—might return millions of disorganized hits. A Semantic Web browser would instead interpret the data, returning a series of diagnostic probabilities that would doubtless include colon cancer, ranked according to likelihood.
“Raw data is just that—numbers, statistics, observations, with no particular order,” Pollock explained. “Knowledge is information transformed into a conclusion or hypothesis. The value added for the Semantic Web is its ability to transform raw data into real knowledge.”
No Longer Lost in Translation
To move caBIG up the knowledge hierarchy, the UCL team replaced the grid’s current computer language—called UML, or Unified Modeling Language—with its Semantic Web counterpart, Web Ontology Language, or OWL. An earlier attempt to unite researchers by using languages specific to their own disciplines, UML is anything but unified. Rather, it reflects an intrinsic disconnect between the nature of research and the nature of disease, said Aditya Vailaya, Ph.D., chief scientist at Retrevo, a marketing firm that uses Semantic Web technology to match consumers with electronic devices.
“Most diseases, especially cancer, are multifactorial, with multiple causes and symptoms. Most researchers, on the other hand, tend to focus on just one factor and report their results in their own language,” explained Vailaya, who patented artificial intelligence technologies as a scientist at diagnostics manufacturer Agilent. “The challenge is getting researchers to collaborate, in language that can be shared.”
The UCL group’s OWL system uses one easily shared source, the NCI cancer ontology, to “build queries which are high-level, descriptive, and applicable to underlying data” in any language, Gonzalez Beltran explains in her team’s paper.
An OWL query can connect an epidemiologist gathering statistical data about lung cancer survivors who express an oncogene, for instance, with a biophysicist who published a 3-D computer visualization of the oncogene inhibiting apoptosis. It’s a language translation tool that Fox Chase’s Beck calls “a natural enhancement to caBIG that likely will be embraced by users and researchers.”
Breast Cancer Cloud
Semantic Web applications are also being developed for specific cancers. A data server called Virtuoso is behind a breast cancer care and prevention application that the University of Mondragón team is developing for the Cruces Hospital in Bilbao, Spain.
“Data integration between departments is a painful task achieved manually most of the time,” said Mondragón team member and computer science professor Ainhoa Serna. His team hopes its application will allow all departments to share clinical, epidemiological, and research knowledge in a well-orchestrated, automatically monitored data cloud.
By transforming files—here, a common machine-read file known as XML into its Semantic Web counterpart, RDF—“our Virtuoso system transforms a typical health record into a semantic health record,” said team member Iker Huerga , CEO of Linkatu, a health care information technology provider in Mondragón, Spain.
Now speaking a universal language that can access data from around the hospital—and around the globe—the semantic health record might apply information from clinical trials and scientific articles to the patient’s medical history. It might recommend diagnostic procedures and treatments. It might send an instant text or e-mail message to a gynecologist warning that her patient has a BRCA1 mutation and therefore a 60% lifetime risk of developing breast cancer.
“The system uses rules defined by the oncologist to identify the most suitable information for a given patient,” Huerga explained. “It then carries out a matchmaking process between information and patient.”
The Semantic Web may be at the forefront of a data explosion that has become, according to UCL’s Finkelstein, “paramount to the detection, diagnosis, treatment, and prevention of cancer.”
But full deployment of Semantic Web technology is still a decade away, Yale’s Cheung explained. Computer languages such as OWL are “more difficult to learn and use” than old standbys such as HTML, he said. “Additionally, supporting tools like Semantic Web query engines lag behind.”
Also lagging: true artificial intelligence. “We still haven’t created a way for machines to learn on their own,” said Retrevo’s Vailaya. “We have to teach them everything.”
Without true artificial intelligence, the probability of machine-generated medical errors naturally rises. Human errors in data entry or computer programming compound any problems.
“The human factor is still missing from much of this work,” said Cheung. “We don’t have a natural-language way to ask the Semantic Web questions, for instance. We don’t know if the end result will be the Wild West—like the current Internet—or something more centralized, with more controls. What we hope for is a Google-esque application that can integrate information smartly, but how smartly will always depend on the human factor.”
Clinician John Robertson worries that even with “such a promising technology,” his time will always be too constrained to effectively use it. Forever rushed, physicians more often get information informally, comparing notes with peers at conferences, scanning journal headlines, or listening to pharmaceutical reps, Robertson explained. “The most computer-oriented way I get information is to look up articles on PubMed, which can be very time-consuming,” he said.
Despite his reservations, Robertson does see Semantic Web technology as “absolutely on the cusp of taking us forward,” especially with new generations of physicians already dialed into online, machine-enhanced information sharing.
“I’m sure that we will have better, more intelligent ways to integrate medical and biological data,” he explained. “All of this new research gives me hope.”