Distilling Data into Knowledge
Elke Rundensteiner, professor of computer science, above, develops technology that can help extract meaning in real time from vast streams of data on the fly to enable decisions to be made quickly, even when conditions are changing moment by moment.
by Eileen McCluskey
Tracking thousands of city dwellers during an evacuation. Mining medical data to improve care, resulting in reduced costs. Creating intelligent software that predicts which cancer patients will benefit from surgery. These challenges require tapping into massive databases, which are dramatically expanding as storage devices become ever larger and cheaper. WPI researchers are blazing trails that harness the power— and societal benefits—inherent in these data mountains.
Saving Lives on the Fly
Elke Rundensteiner, professor of computer science, is developing novel techniques for extracting information from large-scale dynamic data sources on the fly. With a recent $50,000 award from Hewlett-Packard Labs Research Innovation Program, Rundensteiner and PhD student Mo Liu are collaborating with HP researchers (including Song Wang '08; see profile, page 26) to develop technology that will make it possible to find meaning in enormous volumes of constantly changing data.
Concurrently, she and her students are advancing these technologies for real-time event stream analysis with support from a major award from the National Science Foundation. With a UMass Medical Center/WPI Collaborative Pilot Project grant, they are collaborating with colleagues at UMass to track hygiene compliance by healthcare workers in an intensive care unit to prevent the spread of diseases. Ultimately, the technology will help decision makers instantly assess quickly shifting scenarios: massive evacuations during natural disasters, for example, or runaway infections in hospitals. "Scenarios like these present huge database challenges," Rundensteiner says. "You can’t plan ahead for situations that change moment by moment, and you certainly can't wait to make decisions until all the data is in. Traditional static databases are simply too slow for real-time processing."
During a disaster, when lives can be lost in seconds, information may be streaming in to decision makers from thousands of cell phones and smartphones. Rundensteiner’s tools could help turn that cacophony of data into meaning. "We want to know what the most critical needs are and where they are. Decision makers need know where to route vital resources such as medical personnel and water."
To provide instantly relevant data to those who need it, Rundensteiner borrows a traditional model called online analytic processing (OLAP), which is typically used with static data warehouses. But instead of sifting through information offline after it has been collected, Rundensteiner's system immediately extracts complex patterns of interest from data streams utilizing what she calls "real-time complex event stream analytics."
The process involves aggregating thousands of "primitive" events (readings from a fleeing crowd, for example) into compact "higher-level" events. By continually summarizing the event stream to extract its essence, the system reduces the data footprint—saving time, storage space, electricity—and lives.
Rundensteiner sees distributed networks as an integral part of her work on streaming data. She has developed a computing cluster at WPI with funds from the National Science Foundation and plans to use these computers for her HP project. "We don't have to rely on one very fast machine to process the data," she says, "but can instead distribute the work among interconnected computers for maximum scalability."
From left, Isa Bar-On, mechanical engineering, Diane Strong, management, and Sharon Johnson, management, are studying the experiences of medical systems in three countries to gain insights that will help other medical practices successfully adopt electronic medical record systems.
Medical Records Join the Digital Age
Improving care while saving time and money is among the aims of an interdisciplinary team of WPI faculty members. Professor Isa Bar-On in mechanical engineering and professor Diane Strong and associate professor Sharon Johnson in management are leading an international three-year study that seeks to help bring the U.S. healthcare system into the digital age by replacing the paper-based systems now used at most medical facilities with electronic medical record (EMR) systems. Going far beyond the billing and scheduling software commonly used by medical practices, EMR systems gather all data relevant to patient care, including test results, medications, and treatment outcomes. Funded by a $750,000 grant from the National Science Foundation, the study includes interviews with medical, management, and support staff, as well as observations of internal assessment discussions, to learn how implementing EMR systems impacts medical providers, patients, and medical operations. "One of our main goals is to develop new insights and best practices to help guide future EMR implementations," Johnson says.
Four primary care sites—two in the United States and one each in Canada and Israel—are participating. Stateside collaborators are Fallon Clinic and UMass Memorial Heath Care, both in Central Massachusetts. The four sites have reached different points on the digital records continuum, offering the researchers a range of experience to analyze. In Israel, more than 90 percent of primary care practices use an EMR system, while in the U.S. and Canada fewer than 20 percent of healthcare institutions do.
"Looking at the experience in Israel will give us a reality check," says Bar-On. "We'll learn from people who have been using these systems for more than 10 years. And we will examine how the organization changes in response to the implementation of these systems."
Taking the long-term perspective, the WPI team sees transformative benefits in the transition away from paper. "Higher quality medical care is associated with evidence-based medicine," notes Strong. "The data is the evidence. Once medical practices and hospitals have implemented EMR systems, they can reap the benefits inherent in mining the resulting databases to see what works and what doesn’t."
"In other industries," notes Johnson, "enterprise systems akin to EMR have brought higher quality and lower costs. We’re looking for this win-win in healthcare."
"We have so much data. The problem is to create algorithms that excel at extracting patterns that are helpful. This must be done in collaboration with domain experts, such as doctors."
Creating Learning Machines
The algorithms developed by Carolina Ruiz, associate professor of computer science, are designed to allow researchers to learn from experience. To the untrained eye, the ones and zeros may seem meaningless. But to Ruiz, the research represents a teachable moment—teaching new skills to computers, that is. The algorithms she builds in collaboration with WPI students, physicians, and other scientists comb through giant digital collections looking for useful patterns.
In two ongoing projects, Ruiz is developing computational models that seek signatures of sleep disorders within one of the largest human sleep databases to predict which genes will be called upon by specific cell types (heart, liver, lungs) to produce proteins needed for healthy functioning. "We have so much data," Ruiz says. "The problem is to create algorithms that excel at extracting patterns that are helpful. This must be done in collaboration with domain experts, such as doctors."
In another of her current projects, Ruiz and her students are collaborating with faculty at UMass Medical School and Boston College to mine data from patients with pancreatic cancer. Comprehensive data on hundreds of patients includes demographics (age, weight, height), family medical history, results of laboratory and diagnostic procedures, and the outcomes of any surgeries.
"Unfortunately, with pancreatic cancer, by the time it’s diagnosed, it has usually metastasized to other organs," says Ruiz. "In many cases, patients die within a year or two after surgery." Still, oncologists know that in some cases surgery will improve the patient’s quality of life enough to warrant the procedure. But it remains a grey area, and the surgical recommendation is not made lightly. To help predict the most likely outcomes, Ruiz's system learns from significant patterns in the data that show relationships between patients’ nonsurgical data and their postsurgical quality of life. As such software is put to work in clinics, she says, "physicians can use the system to decide whether to perform surgery on new patients."
Through work like this, Ruiz says she looks forward to continuing to raise the bar on computer intelligence. "I don’t see any of my projects as finished," she says. "We reach milestones, but there are always new, related research issues to pursue."