Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-0130104-115506


Document Typethesis
Author NameEl-Sayed, Maged F
URNetd-0130104-115506
TitleAn Efficient and Incremental System to Mine Contiguous Frequent Sequences
DegreeMS
DepartmentComputer Science
Advisors
  • Carolina Ruiz, Advisor
  • Elke A. Rundensteiner, Reader
  • Michael Gennert, Department Head
  • Keywords
  • Frequent Patterns
  • Traversal Patterns
  • Date of Presentation/Defense2004-01-30
    Availability unrestricted

    Abstract

    Mining frequent patterns is an important component of many

    prediction systems. One common usage in web applications is the

    mining of users' access behavior for the purpose of predicting and

    hence pre-fetching the web pages that the user is likely to visit.

    Frequent sequence mining approaches in the literature are often

    based on the use of an Apriori-like candidate generation strategy,

    which typically requires numerous scans of the potentially huge

    sequence database. In this paper we instead introduce a more

    efficient strategy for discovering frequent patterns in sequence

    databases that requires only two scans of the database. The first

    scan obtains support counts for subsequences of length two. The

    second scan extracts potentially frequent sequences of any length

    and represents them as a compressed frequent sequences tree

    structure (FS-tree). Frequent sequence patterns are then mined

    from the FS-tree. Incremental and interactive mining

    functionalities are also facilitated by the FS-tree. As part of

    this work, we developed the FS-Miner, an system that discovers

    frequent sequences from web log files. The FS-Miner has the

    ability to adapt to changes in users' behavior over time, in the

    form of new input sequences, and to respond incrementally without

    the need to perform full re-computation. Our system also allows

    the user to change the input parameters (e.g., minimum support and

    desired pattern size) interactively without requiring full

    re-computation in most cases.

    We have tested our system using two different data sets, comparing

    it against two other algorithms from the literature. Our

    experimental results show that our system scales up linearly with

    the size of the input database. Furthermore, it exhibits excellent

    adaptability to support threshold decreases. We also show that the

    incremental update capability of the system provides significant

    performance advantages over full re-computation even for

    relatively large update sizes.

    Files
  • El-Sayed.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu