Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-0506103-132405


Document Typethesis
Author NameIcev, Aleksandar
URNetd-0506103-132405
TitleDARM: Distance-Based Association Rule Mining
DegreeMS
DepartmentComputer Science
Advisors
  • Carolina Ruiz, Advisor
  • Elizabeth Ryder, Advisor
  • Stanley Selkow, Reader
  • Micha Hofri, Department Head
  • Keywords
  • spatial data mining
  • distance-based association rules
  • distance-based Apriori algorithm
  • Date of Presentation/Defense2003-04-25
    Availability unrestricted

    Abstract

    Abstract

    The main goal of this thesis work was to develop, implement and evaluate an algorithm that enables mining association rules from datasets that contain quantified distance information among the items. This was accomplished by extending and enhancing the Apriori Algorithm, which is the standard algorithm to mine association rules. The Apriori algorithm is not able to mine association rules that contain distance information among the items that construct the rules. This thesis enhances the main Apriori property by requiring itemsets forming rules to “deviate properly” in addition to satisfying the minimal support threshold. We say that an itemset deviates properly if all combinations of pair-wise distances among the items are highly conserved in the dataset instances where these items occur. This thesis introduces the notion of proper deviation and provides the precise procedure and measures that characterize it. Integrating the notion of distance preserving frequent itemset and proper deviation into the standard Apriori algorithm leads to the construction of our Distance-Based Association Rule Mining (DARM) algorithm.

    DARM can be applied in data mining and knowledge discovery from genetic, financial, retail, time sequence data, or any domain where the distance information between items is of importance. This thesis chose the area of gene expression and regulation in eukaryotic organisms as the application domain. The data from the domain was used to produce DARM rules. Sets of those rules were used for building predictive models. The accuracy of those models was tested. In addition, predictive accuracies of the models built with and without distance information were compared.

    Files
  • icev.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu