By now, most people who follow VA health care have heard of the Million Veteran Program. Launched earlier this year, with some 15,000 Veterans enrolled already, MVP is on target to build the world's largest database of health and genetic information. The goal is to better understand the role of genes in disease risk and response to treatment, and to advance medical care that is personalized based on genetic makeup.
Few people, though, know anything about the infrastructure behind the effort. The amount of data collected through MVP will be massive. Creating order out of all of it, and enabling researchers to use it effectively over the coming years, is the challenge of an era for the relatively young field of bioinformatics.
A team at the Boston VA Healthcare System has designed an ambitious, outsize system called GenISIS to meet the need. Backed by huge clusters of servers housed in two locations, the system links de-identified patient DNA samples and health information with a multitude of VA and non-VA databases, along with computer applications such as a call center and mail center to manage MVP enrollment, appointment-setting, and information-gathering.
"I think this is going to be a big deal not only in the world of informatics, but in the world of patient care," says Leonard D'Avolio, PhD, one of the visionaries and driving forces behind GenISIS. The acronym stands for "Genomic Information System for Integrated Science."
D'Avolio is associate director for biomedical informatics at VA's Massachusetts Veterans Epidemiology Research and Information Center. The center is home to the biorepository where bar-coded MVP blood samples are deep-frozen at minus 30 Celsius and robots help process and retrieve these and other research specimens. The biobank is now undergoing expansion and will eventually house up to four million samples, from MVP and other research.
If the biblical tale of Genesis recounts the creation of the first human—genome and all—VA's GenISIS promises to help scientists make sense of that genetic code. The system will allow for studies, many using MVP data, on any number of conditions that affect Veterans, from diabetes and depression to Parkinson's and PTSD. Experts hope the result will be a flow of genetic discoveries to help guide diagnosis and treatment.
Citing PTSD as an example, VA's chief research and development officer, Joel Kupersmith, MD, told National Public Radio in a recent interview, "We'll look at patients with PTSD and patients who don't have PTSD and â€¦ see if there's a gene that's there in the patients with PTSD that isn't there in the patients who don't have it."
Billions and quadrillions of variables
Genetically speaking, each person's cells carry within them some 3.2 billion bits of data. That's how many pairs of nucleotides, or chemical bases, are in the human genome. This represents tens of thousands of protein-coding genes, plus lots of other DNA. By and large, the precise role of one stretch of DNA versus another remains a vast unsolved mystery. There are countless possible variants that could affect health, and scientists have yet to learn about most of them.
"And it's not just the genome," points out D'Avolio. "Each patient also has hundreds if not thousands of other relevant pieces of information"—facts about his or her current and past medical conditions, lab values, prescriptions, family history, lifestyle, environmental exposures. Some Veterans who take part in MVP will have a VA electronic health record going back two decades.
Multiply these billions of data points for each person by the million Veterans whom VA expects to take part in MVP, and you get a figure in the quadrillions. Not even the federal budget deficit is on that scale! These mind-boggling numbers reflect the amount of permutations researchers could potentially analyze using MVP data, in terms of how DNA interacts with other factors to affect health.
The good news is, the larger the numbers, the easier it is for meaningful patterns to emerge. With a study on 500 or 1,000 people, the association between a gene variant and a certain trait would have to be quite striking—a "strong signal," in genomic terms—to catch researchers' attention. This has happened with conditions in which a single gene or only a handful of genes plays a key role—such as the two BRCA genes in hereditary breast cancer.
With most diseases, however, researchers believe the genetic risk factors are spread across larger numbers of genes, with each gene playing only a modest role, subject to the effects of numerous non-genetic factors as well, such as diet. To detect these signals, researchers have to analyze samples that number at least in the tens of thousands. "These connections are going to be discovered only by looking across many data points," says D'Avolio, "which means you need huge data samples." He says MVP, with its projected million-Veteran patient sample, will "make that possible in a way that no one else can."
Enabling researchers to query MVP data
In the not-too-distant future, says D'Avolio, it could be that computers will automatically run searches in the background, comparing mountains of patient data against knowledge bases that store information on what is already known about certain genes, such as those of the National Center for Biotechnology Information. CPUs—not PhDs—will connect the dots and unearth links between gene variants and specific health traits or risks.
Informatics is not quite there, though. For now, researchers—mere humans—have to run specific queries. D'Avolio describes how the process works:
"Researchers can access GenISIS remotely and ask a question—for example, how many patients do we have consent and blood for, who have been seen in the last two years and who have diabetes? And then they can move that data, with appropriate permissions, into a secure environment with a big high-performance cluster, and huge amounts of storage. So we have a very sophisticated analysis environment."
Already, a VA research team is planning to use MVP data for a study on serious mental illness, which affects some 170,000 Veterans who use VA care. While the study is recruiting thousands of Veterans who have schizophrenia or bipolar disorder, MVP would supply "healthy controls" for the sake of comparison. "It's no small thing to get up to 10,000 patients with schizophrenia or bipolar disorder," says D'Avolio. "But then you have to match that with another 10,000 who don't have either disease." He says this study is likely to be "the first scientific contribution of MVP."
MVP data updated through link to patient records
Because the MVP database is linked to VA's electronic health records, it can be periodically refreshed to capture important changes in Veterans' health status, such as new diagnoses or prescriptions.
GenISIS, through its nexus with various VA and non VA databases, could also fetch specific data relevant to a researcher's question, even if those data are not being "brought over" to the MVP database on a routine basis. "Say you want to access specific lab results for your diabetes study, such as patients' hemoglobin A1C values," explains D'Avolio. "We have those 'hooks,' and we know which patients you want them for, so we bring back that data as well."
D'Avolio emphasizes the roles of his 13 GenISIS coworkers in this far-reaching informatics effort and modestly downplays his own contribution. But he can't hide his enthusiasm over what the project represents for the future of health care, and how it has turbocharged his own professional satisfaction. "You're lucky if in the course of a well-intentioned career you have one or two chances to really make a difference. This is the best shot I have at doing something really meaningful for society."