Seeing the Forest for the Trees
Simply put, visualization is a way of taking information and turning it into images that make it easier to comprehend. Road maps, bar charts and organizational charts are examples we encounter in our everyday lives. Scientists use visualization techniques to present their results, confirm hypotheses and extract meaning from their data. In fact, visualization is becoming increasingly important in science, engineering and business because it can provide rich overviews of data and help researchers quickly see the forest for the trees.
But commonly used visualization methods are often inadequate for dealing with exceedingly large data sets--the kind that exceed millions or even tens of millions of records, each with hundreds or thousands of entries. Revealing the patterns and trends hidden in such vast seas of numbers is the specialty of Matthew Ward '77, professor of computer science at WPI, and Elke Rundensteiner, associate professor of computer science.
Their research focuses on the development of interactive visualization and data man-agement techniques that permit scientists to explore massive quantities of data. Ward, who has been working in visualization for more than a decade, is the developer of XmdvTool, a powerful tool for the interactive analysis of large multivariate data sets. The public domain software takes advantage of the ability of the human eye to detect, isolate and classify clusters, trends and anomalies within visual patterns. It integrates a variety of multivariate data visualization techniques, including scatterplot matrices, parallel coordinates, star glyphs and dimensional stacking, along with an extensive suite of interactive tools for filtering the data and modifying the views.
This chart, generated by XmdvTool from a data set consisting of attributes for several hundred cars, demonstrates a capability of the software called parallel coordinates. It shows six dimensions of the data at an intermediate level of detail. Each vertical axis corresponds to a dimension (for example, miles per gallon and number of cylinders). Each colored line represents a cluster (or correlation among a grouping of cars) within the data space. The band around each line shows the spread of values for each dimen-sion within that cluster. For example, the purple line represents eight-cylinder cars with poor fuel economy and low acceleration. Most of the clusters differentiate themselves from others in more than one dimension, allowing the viewer to divide each dimension into ranges (such as low, medium and high) and spot trends and outliers.
Funded by the National Science Foundation since 1998, Ward and Rundensteiner's current research is focused on three interconnected tasks. First, they are extending the visualization techniques of XmdvTool to permit it to display millions of records with thousands of dimensions in meaningful clusters that can be examined at multiple levels of detail. They also hope to improve the software's data management and retrieval capabilities and develop interactive tools to allow users to better navigate the data display and control the level of detail by drilling down, rolling up and zooming.
"Visualization is not meant to replace the traditional analytical or statistical methods of data analysis currently used," Ward says, "but it is a useful tool for understanding the structure and characteristics of a given data set. Visualization is 'exploratory analysis.' It allows you to use your innate visual pattern recog-nition abilities to spot clusters, trends, and anomalies that direct you toward the 5 percent of the data that is important, while letting you bypass the 95 percent that is not."
XmdvTool currently has hundreds of users from a wide variety of application domains, including environmental monitoring, stock market analysis and bioinformatics. It is also used in visual data mining research and in information visualization graduate courses at several universities. Ward says feedback from users has been invaluable, as each new domain provides him, through its own unique data characteristics and exploratory tasks, new opportunities for taking visualization to yet another level of exploration.
Links related to this email@example.com
Maintained by: firstname.lastname@example.org
Last modified: Jul 02, 2010, 10:28 EDT