Harnessing computing power for cancer research and care
June 17, 2010 | Paul Govern
On iTunes U, a lecturer predicts that we will one day be routinely giving drugs to computers.
When you get sick, someone will load your medical history and your genome sequence (and perhaps other selected data about your biochemical composition) into a simulator. As various drugs are entered, you’ll learn of their predicted effects in your body.
While such scenarios may lie in our (perhaps still distant) future, it will take time – and some major computing power – to get there.
Biology has quite recently entered a golden age. Majestic, beckoning mountains of biological data – composed primarily of information about our genomes and the subtle differences among them – are filling up the planet’s servers at explosive rates.
Computer-assisted analysis of such huge data sets is the realm of “bioinformatics” – a discipline that applies information/computer science approaches to help researchers make sense of this biological information.
But that is only part of the equation; to achieve this futuristic scenario will also require the contribution of “biomedical informatics,” a field focused on applying the power of computers to health care, through, for example, electronic medical records and decision support systems.
Though they had divergent beginnings, the fields of bioinformatics and biomedical informatics are now beginning to merge.
“We’ve had bioinformatics, which essentially grew out of molecular biology and had the vocabulary and the cultural values of wet-bench biologists. It was just tying to figure out how life works, what’s the machinery like,” says Dan Masys, M.D., professor and chair of Biomedical Informatics at Vanderbilt, one of the nation’s largest biomedical informatics departments.
“And then we had clinical informatics, which grew historically out of people building electronic medical record systems for hospitals.
“These two types of people, if you put them in a room they would not have much to talk about, but what we’re seeing is the emergence of this relentless convergence of these tools in an area we call clinical bioinformatics. And that’s about understanding molecular patterns that have direct relevance to human health and disease and health care decision-making.”
Life encoded…and corrupted
It all starts with DNA, an information storage molecule in the nucleus of cells. Like a computer stores information in a digital code of zeros and ones, DNA encodes the basic instructions for life using four chemical “letters” or bases: A, T, C and G (adenine, thymine, cytosine and guanine). These letters, positioned in “nucleotide” pairs, form the rungs of DNA’s “twisted ladder” or double helix. The double helix unzips down the center, allowing the sequence on either side to be transcribed as RNA molecules. In tiny automated workshops called ribosomes, some of the RNA gets read off and translated into big, bad proteins, the manifold machinery of cellular life.
Genes are a knotty concept, but they’re basically sections of DNA corresponding to proteins, or to otherwise interesting bits of RNA. They function in networks, with some genes making products that activate and deactivate other genes. There are approximately 22,000 genes in the human genome.
As our cells divide and the genetic code is replicated and bequeathed to daughter cells, despite a lot of error checking, the nucleotides are vulnerable to sporadic scrambling – a “corrupted code,” in computer-speak. Radiation, chemical exposures and viruses can also cause scrambling. When you add it all up, we’re riddled with random mutations: single-nucleotide changes, insertions, deletions large and small, chromosome inversions, translocations between disparate chromosomes, and amplification – gene duplication that can potentially boost production of a given gene product.
In the time it takes to read this article, you’ll develop umpteen thousands of new mutations, but an overwhelming majority of mutations come to naught, and only on very rare occasions are some mutations plucked from obscurity and thrust into consequential roles. Germ-line mutations (those in reproductive cells) can be a ticket out of the primordial sludge (the only ticket).
Cancer can arise when somatic mutations — those in bone, brain, blood, skin or what have you — randomly mount up into some unlucky combination, giving some cell the keys to unchecked growth. Odds worsen as we age and begin carrying around more and more mutations.
With so many ways for so many nucleotides to fall out of proper sequence, one begins to grasp the role of computers in understanding genetics and cancer.
Take genome sequencing, which is generally preceded by chopping up an organism’s DNA into thousands of varying lengths, amplifying the fragments (so that you’ve got more material to work with) and locating the letters along the fragments (with an old method called electrophoresis, which reveals the positions of nucleotides by virtue of their relative weights). Once these myriad ordered sets are loaded into a computer, an algorithm (a set of rules for solving a problem) can then be put to work, attempting to sequence as much of the code as possible based on overlap found among the sets.
Without computers, it’s hopeless. But bioinformatics has never been primarily about technology for Masys, who, years ago as a young oncologist, recognized the power of computers in revolutionizing biology and medicine.
It’s instead about “understanding the semantics of the data. It’s about finding related observations about biologic behaviors of cells in a variety of different kinds of databases – genes, proteins, carbohydrates, control factors,” he says.
“At a molecular level, it’s figuring out how the hip bone is connected to the thigh bone.”
The power of computers in finding biologically important patterns is illustrated by a pivotal event from the mid-1980s. Researchers at computers in London and San Diego were struck to find that an entity called the v-sis oncogene, which they regarded as a cancer gene, was the spitting image of a harmless – in fact, essential – gene that molecular biologists had identified as producing a growth factor.
“A light went on,” says Masys. “The oncogene was just the growth factor gene switched on at the wrong time, when cells shouldn’t be proliferating.”
“That was a key insight in the history of informatics, because that pattern-matching done by computers was just a matter of searching through databases of gene and protein sequences that had been deposited by many different labs for many different purposes, and it led to a key biological insight that nobody in the world expected.”
Where illness is concerned, the focus narrows to a particular disease, with analysis of sample after sample in a search for molecular patterns – comparisons between disease samples and normal samples, samples from patients sensitive to a particular drug and patients not sensitive, samples from patients who had a recurrence and patients who remained disease free. (A favored strategy for narrowing in on disease-related genes and their associated intracellular signaling pathways is microarray analysis, an economical method revealing variable gene expression across part or all of the genome.) Ultimately, the hunt is for biomarkers — traceable substances that reliably indicate biological states.
But a data pattern and a cause-and-effect relationship are quite different things.
Associate Professor Zhongming Zhao, Ph.D., M.S., is a pattern finder.
“It’s always association, whether this gene or set of genes is potentially related to this disease,” he says. “If you find a pattern in a gene or a set of genes that’s always different by chance between two sets of samples, it’s a potential biomarker. Then you try to validate.”
Zhao came to Vanderbilt-Ingram in late 2009 as chief bioinformatics officer and director of the Bioinformatics Resource Center. In 2000, as he saw the human genome project reaching completion, Zhao decided to go back to school for a master’s degree in computer science – this after already having earned master’s degrees in genetics and biomathematics and a doctorate in human population genetics.
“I realized the coming of a lot of genomics data. I decided to study computer science because I knew we couldn’t do it by hand; we needed to analyze by some intelligent modeling algorithm.”
For diseases like cancer, molecular biomarkers – for a given diagnosis, prognosis, drug response – usually come with probabilities attached. Gene expression profiling might tell you, for example, that it’s 80 percent likely that you have prostate cancer. According to Zhao, more sophisticated integration of data is destined to yield increasingly probative, multi-dimensional biomarkers.
“We encourage cancer investigators right now to always analyze genomic data from two platforms. We want to try to perform more advanced analysis, combining DNA sequences, mRNAs, proteins and their interactions.”
Connecting genes to clinical practice
There’s a project taking shape at Vanderbilt-Ingram that hinges on the power of bioinformatics and biomedical informatics.
Vanderbilt is preparing to launch routine genotyping of tumor tissues, looking for known and suspected genetic biomarkers that could help steer more precise, less toxic cancer treatment.
Chemotherapy is toxic and hit or miss. The emergence of molecular biomarkers is leading to finer and finer sub-typing of cancer and a rush to discover targeted drugs designed to interfere with abnormal molecules while remaining nontoxic to normal cells.
Targeted cancer drugs are entering the pipeline and a few have already emerged with accompanying genetic biomarkers for predicting patient response.
Associate Professor William Pao, M.D., Ph.D., is an oncologist and cancer biologist with a special interest in tyrosine kinases (TKs), a group of intracellular signaling enzymes, which, when mutated so as to become stuck in the “on” position, are implicated in cancer. He has studied the mechanisms of targeted drug compounds that inhibit certain mutant TKs while allowing normal TKs to carry on.
As director for personalized medicine at Vanderbilt-Ingram, Pao has led the planning for routine genotyping of tumor tissues. Massachusetts General Hospital and Sloan-Kettering are two centers known to have already begun routine clinical genotyping of cancer tumors. At Vanderbilt there will be a major new twist: concurrent with these new clinical assays, and building on the Medical Center’s strengths in clinical informatics (e.g., electronic medical records), Vanderbilt-Ingram will begin developing a system for personalized medicine.
As more and more tumor genotyping is digitally collated with other information from medical records, the expectation is that a new understanding of cancer and a new level of clinical decision support will emerge. This is clinical informatics, the bailiwick of Mia Levy, M.D., who arrived last August as clinical informatics officer at Vanderbilt-Ingram.
Levy recalls entering medical school (after undergraduate study in bio-engineering and work as a programmer) thinking that “I might not even practice medicine and I would just go into informatics.
“I wanted to know what the real problems of physicians were; I didn’t want to just build tools without really understanding what their issues were.”
In her third year, during a rotation at the National Library of Medicine, she foresaw an increasing role for informatics in the understanding and treatment of cancer, and she wound up combining training in oncology with work toward a doctorate in biomedical informatics.
“I know [Cancer Center Director] Jennifer Pietenpol likes to brag that there’s less than a handful of medical oncologists who are also trained in informatics and we have two of them at Vanderbilt – Dan Masys and myself – so that’s pretty cool.”
Before Levy the student espied the rich informatics vein in oncology, there was a more strictly personal side to her interest in this medical specialty. During her first year in medical school, her mother was diagnosed with metastatic breast cancer.
“She was fortunate to live for seven years before she passed away, a few years ago. And I became a breast oncologist,” Levy says.
“I’m Jewish, and so many women in my mother’s sphere have been affected by breast cancer. So maybe it’s a little self preservation, but it’s also that I’m just trying to help any way I can.”
Research by Pao and others has established that a single point variant (single DNA letter change) can all by itself signal a likely protein malfunction and thus furnish a response prediction for a targeted drug. Conversely, the lack of a point variant at some given site may forecast zero response to some given drug.
At Vanderbilt, whenever pathology is positive for certain types of cancer (starting with lung cancers and melanomas), genotyping in the clinical molecular biology lab will be initiated automatically, screening cancers for approximately 40 different mutations in up to nine different genes. The mutation sites qualify as biomarkers for predicting response to established drugs or drugs in clinical trials.
“It’s at the gene level that we’re going to be providing decision support to our clinicians: is EGFR mutated or not, is KRAS mutated or not, is BRAF mutated or not,” Levy says.
She has been pleased to learn that the lab apparatus for these assays will be returning structured, machine-readable data, but that still leaves questions about how best to present the results to clinical teams.
“This is not like a chemistry panel done for a hundred years. This is new genetic data, which has not been well represented in the electronic health record before. There’s a whole nomenclature for genes and amino acids. Our system will need to pull in these results and represent them in the medical record in a coded way that people can reason with.”
She says the decision support initially will amount to the display of any clinically relevant genotype results, followed soon by flags and automated messages when a result suggests that enrollment in a given clinical trial should be considered.
AI in the hospital
Levy is also charged with a much larger project: bringing biomedical informatics to aid treatment prioritization for cancer patients. She stressed that this new automated reporting of genotyping is only a starting point. Both she and Masys envision a system that, practically on its own, will be able to generate knowledge about the best ways to treat cancer – a sort of “artificial intelligence” for health care.
“What we’re going to do with the next generation of decision support is record decisions, then track patients and see what happens,” Masys says. “If we can harvest molecular patterns and how their stories play out clinically, in a short time we can improve the treatment rules, instead of waiting years for clinical trial results. It’s this very interesting synthesis of practice and research, being fused together and building at a fast pace.”
This concept is often given as “today’s patients informing tomorrow’s care,” Levy says. “The premise is that we don’t know what patterns will emerge, what will turn up as relevant, so we need a system that can learn that and form a basis for decision support.”
Standardized cancer patient assessment is a major prerequisite for the learning system envisioned by Levy and Masys. The system will use all manner of standardized data – lab results, diagnoses, co-morbid conditions, disease staging, treatment selections, details on the management of treatment, and finally patient response.
“If we have all that, we can begin to do population-based analysis, and we can test and tweak treatment selection methods until we reach optimum outcomes,” says Levy. “But getting people to put in structured data is always a time-suck and they’re always looking for someone else to do it.”
So the challenge is getting more patient information into standardized and machine-readable form without slowing down work in patient care areas and clinical labs.
Take imaging results for cancer. To have any hope of studying patterns of tumor response across a population, the radiology reports for each patient need to be consistent across the course of treatment, at every stage measuring the same tumors, arriving at a longitudinal string of values representing the patient’s tumor burden. And because this makes for relatively laborious reporting, it’s currently done only for clinical trials; according to Levy, most oncologists who work with adult patients have grown inured to receiving radiology reports that don’t clearly connect up from one stage of treatment to the next. What’s more, radiology reports currently come back as unstructured text.
Levy has a plan for pulling consistent, structured radiology findings into a cancer patient assessment system. An ad hoc community of computer experts has collaborated over the Web to create an open-source markup tool for radiology images. With images loaded, users demarcate tumors with mouse clicks and the tool spits out the tumor dimensions. Levy is planning to import this tool into the Vanderbilt system, setting it up so that, as radiologists call up new images for cancer patients, the system will automatically retrieve any past reports and supply guidance for generating standardized, structured follow-up data, the sort that will be allowable in the cancer population assessment database.
Cancer pathology (based on visual examination of cells) is another Vanderbilt report that currently issues as unstructured text, so Levy is weighing solutions for getting it into structured form. As for structured cancer staging data, Vanderbilt’s electronic medical record system already has a module that oncologists can use to enter this information. But not everyone uses it.
“We have to work on that, so that clinicians understand that, if they put it in, they get this secondary gain,” Levy says. Perhaps most challenging will be to gather information on side effects like nausea and diarrhea, information currently adrift in clinical notes written by physicians and nurses; one possibility is natural language processing, which works by text mining and keyword identification. The system will also need to know various patient preferences (a grim real-world example: would the patient prefer hair loss or diarrhea with his chemo?).
Beginning in July 2011, Levy plans to start rolling out a cancer decision-support-cum-ordering system, covering chemo, targeted drugs, labs and imaging – and spanning inpatient, outpatient and home medication realms.
As standardized assessment kicks in and the system starts to learn, it won’t be left to evolve entirely on its own. Guidance for best practice, especially regarding drugs, is the province of the pharmacy and therapeutics committee.
“This will be an enhancement and escalation of complexity in terms of the types of data that drive the committee,” Masys says, “but the fundamental combination of people, process and technology are all there – just need to pour in the new data.”
As the system matures, at some point it will become appropriate to push some of the decision support features directly to patients.
Acquiring the types of personal genomic data required to do this used to be cost-prohibitive; it initially cost billions of dollars to sequence the human genome.
“Now, to actually get the complete sequence of a cell, it’s in the range of about $20,000 to $50,000,” Masys says, “and we think, with so-called next generation sequencing technologies, that within three to five years you could get your entire genome in your electronic medical record for less than the price of a CT scan, for about $1,000.”
We may be giving drugs to computers sooner than we thought.
Photos by Susan Urmy
Computer and genetic code images: @iStockphoto.com/enderbirer and @iStockphoto.com/zmeel