Document Type dissertation Author Name Varde, Aparna S URN etd-081506-152633 Title Graphical Data Mining for Computational Estimation in Materials Science Applications Degree PhD Department Computer Science Advisors Elke A. Rundensteiner, Advisor David C. Brown, Committee Member Carolina Ruiz, Committee Member Neil T. Heffernan, Committee Member Richard D. Sisson Jr., Committee Member Keywords Classification Data Mining Domain Semantics Graphical Data Clustering Date of Presentation/Defense 2006-08-15 Availability unrestricted Abstract
In domains such as Materials Science experimental results are often plotted as
two-dimensional graphs of a dependent versus an independent variable to aid visual analysis. Performing laboratory experiments with specified input conditions and plotting such graphs consumes significant time and resources motivating the need for computational estimation. The goals are to estimate the graph obtained in an experiment given its input
conditions, and to estimate the conditions needed to obtain a desired graph. State-of-the-art estimation approaches are not found suitable for targeted applications.
In this dissertation, an estimation approach called AutoDomainMine is proposed. In AutoDomainMine, graphs from existing experiments are clustered and decision tree classification is used to learn the conditions characterizing these clusters in order to build a representative pair of input conditions and graph per cluster. This forms knowledge discovered from existing experiments. Given the conditions of a new experiment, the relevant decision tree path is traced to estimate its cluster. The representative graph of that cluster is the estimated graph. Alternatively, given a desired graph, the closest matching representative graph is found. The conditions of the corresponding representative pair are the estimated conditions.
One sub-problem of this dissertation is preserving semantics of graphs during clustering. This is addressed through our proposed technique, LearnMet, for learning domain-specific distance metrics for graphs by iteratively comparing actual and predicted clusters over a training set using a guessed initial metric in any fixed clustering algorithm and refining it until error between actual and predicted clusters is minimal or below a given threshold. Another sub-problem is capturing the relevant details of each cluster through its representative yet conveying concise
information. This is addressed by our proposed methodology, DesRept, for designing semantics-preserving cluster representatives by capturing various levels of detail in the cluster taking into account ease of interpretation and information loss based on the interests of targeted users.
The tool developed using AutoDomainMine is rigorously evaluated with real data in the Heat Treating domain that motivated this dissertation. Formal user surveys comparing the estimation with the laboratory experiments indicate that AutoDomainMine provides satisfactory estimation.
Files avarde.pdf
Browse by Author | Browse by Department | Search all available ETDs
Questions? Email etd-questions@wpi.edu