Document Typedissertation Author NameVarde, Aparna S URNetd-081506-152633 TitleGraphical Data Mining for Computational Estimation in Materials Science Applications DegreePhD DepartmentComputer Science AdvisorsElke A. Rundensteiner, Advisor David C. Brown, Committee Member Carolina Ruiz, Committee Member Neil T. Heffernan, Committee Member Richard D. Sisson Jr., Committee Member KeywordsClassification Data Mining Domain Semantics Graphical Data Clustering Date of Presentation/Defense2006-08-15 Availabilityunrestricted

AbstractIn domains such as Materials Science experimental results are often plotted as

two-dimensional graphs of a dependent versus an independent variable to aid visual analysis. Performing laboratory experiments with specified input conditions and plotting such graphs consumes significant time and resources motivating the need for computational estimation. The goals are to estimate the graph obtained in an experiment given its input

conditions, and to estimate the conditions needed to obtain a desired graph. State-of-the-art estimation approaches are not found suitable for targeted applications.

In this dissertation, an estimation approach called AutoDomainMine is proposed. In AutoDomainMine, graphs from existing experiments are clustered and decision tree classification is used to learn the conditions characterizing these clusters in order to build a representative pair of input conditions and graph per cluster. This forms knowledge discovered from existing experiments. Given the conditions of a new experiment, the relevant decision tree path is traced to estimate its cluster. The representative graph of that cluster is the estimated graph. Alternatively, given a desired graph, the closest matching representative graph is found. The conditions of the corresponding representative pair are the estimated conditions.

One sub-problem of this dissertation is preserving semantics of graphs during clustering. This is addressed through our proposed technique, LearnMet, for learning domain-specific distance metrics for graphs by iteratively comparing actual and predicted clusters over a training set using a guessed initial metric in any fixed clustering algorithm and refining it until error between actual and predicted clusters is minimal or below a given threshold. Another sub-problem is capturing the relevant details of each cluster through its representative yet conveying concise

information. This is addressed by our proposed methodology, DesRept, for designing semantics-preserving cluster representatives by capturing various levels of detail in the cluster taking into account ease of interpretation and information loss based on the interests of targeted users.

The tool developed using AutoDomainMine is rigorously evaluated with real data in the Heat Treating domain that motivated this dissertation. Formal user surveys comparing the estimation with the laboratory experiments indicate that AutoDomainMine provides satisfactory estimation.

Filesavarde.pdf

Browse by Author | Browse by Department | Search all available ETDs

Questions? Email etd-questions@wpi.eduMaintained by webmaster@wpi.edu