February 23, 2017
Patrick O'Connor

They call it “mining” for a reason. Like prospectors driven to dig through mountains seeking a small, yet valuable vein, data scientists, technologists, and medical researchers see tantalizing opportunities hidden deep in the mountains of digital medical data piling up around the world. They have no doubt that clues to better treatments—perhaps even cures—for many diseases lie undiscovered within the data. At WPI today, researchers are pushing the boundaries of data science and technology development to bring to the surface new knowledge that will help clinicians and patients.

Information Overload

When a medication harms instead of heals, a report is made to the U.S. Food and Drug Administration (FDA) so the agency can act to prevent others from suffering the same fate. That’s the theory. In practice the system is far from foolproof, due to the huge volume of data flowing in to the FDA and the almost impossible challenge of extracting key information from that deluge of electronic reports.

“It’s information overload,” says Elke Rundensteiner, PhD (above), professor of computer science and founding director of WPI’s Data Science Program. And making matters worse, she says, “the information they need to know is buried deep in the data.”

I have no doubt that there is information deep in the data that could help many people. We just have to find it.
  • Elke Rudenstiener

Rundensteiner and a team of students are collaborating with the FDA to help them tackle this problem. The project began in the summer of 2015 when Marni Hall ’97, PhD, MPH, now senior vice president of research and development, informatics, and policy at PatientsLikeMe, was in charge of the office that runs the FDA’s Adverse Event Reporting System. As a member of WPI’s Arts and Sciences Advisory Board, Hall was familiar with WPI’s faculty and its research programs. “Marni explained the problem the FDA was having and asked if we could help,” Rundensteiner says. “This led to a working relationship between the university and the FDA. Now we have two PhD students who are being supported by a fellowship to work on this important project.”

With a digital archive that already contains 10 million adverse event reports dating back to 1969, the FDA receives another 1.5 million new reports every year. “The first problem the FDA staff faces is which reports to read, because they can’t closely investigate them all,” Rundensteiner says.

Searching for Lifesaving Information

With support from the NSF and major technology companies, Elke Rundensteiner leads a research group focused on very large database and information systems in support of advanced applications in business, engineering, and the sciences. In the area of health care, she and her team are working with the Food and Drug Administration to develop machine learning algorithms that can pore over reports on incidents involving drugs and medical devices to bring critical patterns to the surface.

Furthermore, each new report needs to be compared in meaningful ways to the archive to look for patterns that could shed light on the case. “So one of our long-term goals is to develop a data exploration system that will read all of the new reports, relate them to the archives and to other pieces of related information, and then identify the critical reports the examiner should focus on,” she says.

One major challenge concerns the way the data is “structured.” The online adverse event reporting form has a number of fields and drop-down items that physicians, patients, or medical device providers can use to input data about the patient and the adverse event. That is known as structured data because the information is uniform across all reports and is easily mined. But the form also includes a text box where physicians can write a narrative about the event, using their own style and grammar. This “unstructured data” is difficult for a computer to parse. “The reality is, there is missing information in that structured data,” Rundensteiner says, “with additional valuable information to help understand the particular case often hidden in the narrative.”

The WPI team aims to develop machine-learning algorithms that will be able to “read” the doctors’ notes and extract the key concepts characterizing the adverse events. The key words will be assembled into a structured format so it can be mined. “Dealing with natural language is very complex,” she says, “which makes this an extremely challenging albeit important problem.”

For example, a note may include the name of a medication and the word “rash.” Does that mean the patient had a rash and used the medication to treat it? Or was the rash a reaction to a medication taken for another purpose? Context is key, and most cases are not as simple as one patient taking one medication for the first time.

“This is a very challenging project,” Rundensteiner says, “and it deals with a widespread issue with important ramifications well beyond this particular FDA scenario. I have no doubt that there is information deep in the data that could help many people. We just have to find it.”

Machine Learning

“We design intelligent algorithms that can learn from experience, and that experience is represented as data,” says Carolina Ruiz, PhD, associate professor of computer science.

Working in collaboration with clinicians at the University of Massachusetts Medical School (UMMS) and UMass Memorial Medical Center (UMMC), both in Worcester, Ruiz has mined the medical records of 500 pancreatic cancer surgery patients for information that might help physicians predict how other patients will fare after pancreatic surgery.

But before that quest could begin, she and her team had to collect and integrate the medical records consistently and enter them into a structured database. “Preparing the data is a time-consuming but essential task,” she says. Once it was structured, Ruiz developed and applied machine learning algorithms to sift through the data and bring to light features of the medical records that were good predictors of surgical outcomes.

After the key features were selected, she “trained” the algorithm by running it on small portions of the data and evaluating its performance against the patients’ known outcomes. Adjustments were made to help the algorithm learn better, then it was applied to the entire data set. For comparison, a group of physicians who treat pancreatic cancer were asked which features they use to predict a patient’s likely outcome.

“The physicians were surprised,” Ruiz says. “They had all selected different features than the algorithm, based on their experience and intuition. But the algorithm had better performance with the features it selected.”

We design intelligent algorithms that can learn from experience, and that experience is represented as data.
  • Carolina Ruiz

Among the important features the algorithm focused on were the number of drains used during the surgery, the amount of postoperative bleeding, and the number of days until the patient was able to resume a regular diet. “Pancreatic cancer is a very difficult disease to treat,” she says, “so it is helpful for physicians to have this data-inferred knowledge as they develop treatment plans for their patients.”

Using similar approaches, Ruiz and her team are mining data from patients with sleep disorders, searching for patterns and identifying features that could lead to improved treatments. In an ongoing partnership with neurologists at UMMS, she has established a structured database that currently contains the medical records of 1,000 sleep disorder patients — about half a gigabyte of data for each patient. “One sleep study records data from 55 sensors worn by the patient for the entire night’s sleep, which generate an enormous amount of data,” she notes.

Unlike the pancreatic surgery project, which sought to answer a specific question, the sleep data mining project is open-ended and “unsupervised,” she says. Her research group has developed techniques that are able to automatically discover patterns across a wide spectrum of patient data: patterns that relate demographic information, medical history, family history of disease, exercise habits, drinking and smoking habits, biomedical signals, medical treatments, and medications. “In this case, we did not tell the algorithm how to organize the data,” she says, “because we didn’t want to bias it and prevent it from discovering novel patterns on its own. If we knew what the important patterns were, we wouldn’t need machine learning.”

Using these unsupervised techniques, Ruiz’s group has discovered novel patient subpopulations that exhibit distinct medical and behavioral properties. “By analyzing the subpopulations uncovered by our algorithm,” she says, “we determined that they can be characterized by their dynamic sleep properties — high vs. low efficiency — and that static properties, including age, collar size, smoking frequency, heart disease, and BMI [body mass index], differ across these populations in a statistically significant manner.”

The project is ongoing. Ruiz says she expects that many other medically meaningful patterns will be uncovered that will shed light on the nature and treatment of sleep disorders.

Translating Data for Patients


In WPI’s Foisie Business School, professor Diane Strong, PhD (at left, above), and associate professor Bengisu Tulu, PhD, are leading several teams developing smartphone-based applications that extend the impact of knowledge gleaned from digital health data.

“All the apps you see on the market today are trackers: tracking physical activity, tracking what you eat,” Tulu says. “With our apps, tracking is just the start.”

Tulu and Strong partner with clinicians and researchers at UMMS to embed evidence-based medical guidance in their apps to provide users with clinically sound prompts and action items. “In our view, a health app has to do more than tell you your numbers,” Strong says. “It should also be able to give you evidence-based information that will help you manage your own care.”

Among the smartphone apps developed by the WPI teams, which include several other WPI faculty members and students (graduate and undergraduate), is Sugar, which helps people with type 2 diabetes control their blood glucose levels and monitor severe foot ulcers, and RELAX, which helps people lose weight and overcome stress. Both are in early clinical testing with patients at UMMC.

One of the latest apps leverages a national database managed by UMMS on outcomes of total joint replacement surgeries. Patients will track their pain levels and other metrics on a daily basis using their smartphones. When they come to the clinic, the app will summarize their data and relate it to cases in the database, to give the clinician better information to assess progress and plan treatment. The app will soon be tested with UMMC patients.

Apps to Help Manage Your Health

As part of their overall research interest in how technology can improve the delivery of healthcare, Diane Strong and Bengisu Tulu in the Foisie School of Business create apps that give patients more control over their own well-being, reduce the need for doctor visits and other interventions, and to improve the quality of care that physicians and other healthcare providers can deliver.

“We hope the app will make the time patients spend with their physicians more meaningful,” Tulu says. “Currently, a lot of time is used going over the patient’s pain history. With the app, that information will be summarized for the provider and set in context with the known trends from the database. So more time will be available for patient and provider to discuss treatment and answer questions.”

Learning from early patient usage of the apps in development, Tulu and Strong continue to explore ways to optimize the user experience and present data in ways that are relevant for both the patient and the healthcare provider. “Physicians and patients need different data and they expect it to be presented in different ways. We are still working on closing that gap,” Tulu says. “Usability, visual design, and novelty are all important. You can’t just bring people back to the same message all the time. They will get bored or discouraged and stop using the app.”

In addition to a firm foundation in clinical data, building an effective app requires a multidisciplinary team with data scientists, software engineers, web developers, cybersecurity experts, and digital designers, Tulu says.

“You only see your doctor a couple of times a year, if that,” Strong says. “Your health is not your doctor’s responsibility, it’s your responsibility. So our aim is to build out a platform and standards for apps that can help people better manage their own well-being.”

And whether the task is building more useful health management apps, mining data for leads on improved treatments, or developing tools to surface critical health information, promoting well-being and helping patients have the best possible outcomes from their interactions with the healthcare system is the goal that continues to guide WPI’s researchers.

First Published in WPI Research, 2017 edition